Support for structured data records is virtually non-existent in MediaWiki. It's sorely missed on Wikipedia though, and there have been several attempts to implement it: full blown systems like WikiData or MediaWiki Semantic MediaWiki, and also some more limited extensions like the Data Extension, PageAttributes, DataTable, or Infobox_Data_Capture.
To add my 2¢ to all that, here's what I came up with when thinking about the issue:
We would want a simple, flexible, minimalist approach that can be tested non-intrusively on wikipedia, and which can be expanded in use over time. Conceptually, I would limit this to plain properties assigned to entities, where a page would represent exactly one entity. The set of possible properties would be divided into groups, which correspond to types or aspects of entities: examples would be person.firstName, book.title, place.longitude, etc.
Obvious applications of this would be to capture existing structured data from infoboxes, taxoboxes, geotags, etc. One use I would like to suggest (and which I will write about in detail later, maybe), would be bibliographic records: make a Book namespace, and put a data records and a short description of the book there. Split it into an <includeonly> and a <noinclude> section, so it can be used as a template, especially for <cite> entries.
Properties would be assigned to the entity described by a page using a parser function called #property, which would produce no visible output. For example, basic data about Albert Einstein could be recorded like this:
{{#property:person.firstName|Albert}}
{{#property:person.lastName|Einstein}}
{{#property:person.dateOfBirth|1879-03-14}}
{{#property:person.placeOfBirth|Ulm, Germany}}
This could of course be done in a template that is used on the actual page about Einstein, for example wp:de:Template:Personendaten - that way, only a single template needs to be edited, and structured data about thousands of entities would start to flow into the database automatically.
It might be useful to have a meta-page describing the meaning and use the properties for each property group - in this example, maybe Wikipedia:Person properties. Also, it may be good to allow only a preconfigured set of properties to be used. In that case, a pattern could be defined that would allow MediaWiki to check the values supplied for each property: for example, for person.dateOfBirth, the regular expression -?\d{4}-\d{2}-\d{2} could be used to check for valid dates.
In order to utilize this data, some special pages would have to be implemented. A generic query mask would be a start, allowing users to query pages (and properties of pages) by existence of a property or group, and value of a property, possibly restricted further by namespace and category. But it would also be possible, and probably useful, to have application-specific query pages that can be used to retrieve pages about books, people, places, etc with a specialized query mask and represented in a specialized form suitable for that kind of data.
The properties would be stored in a special database table (maybe called page_data, because page_props is already used for something else), with the following columns:
- page: the id of the page the properties are assigned to; that is, the subject
- group: the property group
- property: the name of the property
- value: the property's value
It may be a good idea to allow comments or qualifications to values, especially if they are restricted by a pattern: for example, the population of a city could be given as {{#property:place.population|123456|in 2001}} indicating the date the census data was collected. This could be stored in an extra comment column in the database. This could also be used for citing sources, giving units of measurements, indicating disputed figures, etc. Also, if we ever get flagged revisions, property records would have to be subjects to those flags too - if we have a "current" and a "reviewed" version of a page, we should also have a "current" and a "reviewed" version of the corresponding data record.
Property data may also be saved to different tables depending on the property group: this may improve performance for much-used property groups, and it might also be used to allow sharing this data across wikis (though that entails several complex problems, like keeping the wikitext-version of the data synchronized).
To conclude, this way of associating structured data with articles would be simple to implement, and would fit well with the current way data is managed on wikipedia. It could be tested in a restricted domain of application, like bibliographic or biographic records, and the be expanded over time to cover more and more things like geodata and other information from infoboxes.
| Free Content |
[talk page]Talk:WikiData light
The above comments may have been left by visitors.
This site's operators can not take responsibility for the content of such comments.




[edit] better?
imho the name and group should go into a separate table.
[edit] Specific features
You mix several ideas, I try to seperate them and mix my thoughts together with it:
Frankly the benefit compared to templates is little in the beginning. Now we can already extract data from templates and analyze them offline, for instance at the toolserver. There will always be the need to first extract data from Wikipedia and then analyze and use it, so we should not try to implement too much intelligence in the MediaWiki core or a WikiData extension. My 2¢. -- JakobVoss 16:08, 3 June 2008 (CEST)
[edit] using it in multiple projects
How do you use it in srn.wikipedia or for the xh.wikipedia... an the over 120 wikipedias that may not know Albert Einstein yet ?? Thanks, GerardM