HTML5 Microdata - Over-cooked?

May 24, 2009

What is Microdata?

Microdata is HTML5's answer to how we should go about embedding machine-readable data in our mark-up.

At a high level, microdata consists of a group of name-value pairs. The groups are called items, and each name-value pair is a property. Items and properties are represented by regular elements.

A simple example looks something like this:


<div item>
 <p>My name is <span itemprop="name">Frances</span>.</p>
 <p>My work for the <span itemprop="company">BBC</span>.</p>
 <p>I am <span itemprop="nationality">British</span>.</p>
</div>

Where the item has 3 properties with values (name:Frances, company:BBC, nationality:British).

You can then associate item properties with items that the property is not a direct descendant of, with the subject attribute.

Essentially, you have some new attributes at your disposal:

item - to specify a group.
itemprop - to define the property of an element inside an item.
subject - to associate a property with a non-parent item.

You can also type items with a URL, reverse DNS labels or a pre-defined type (and each itemprop can accept multiple properties, as you'd expect with class):

Here, the item is "org.example.animals.cat":
<section item="org.example.animal.cat">
 <h1 itemprop="org.example.name">Hedral</h1>
 <p itemprop="org.example.desc">Hedral is a male american domestic
 shorthair, with a fluffy black fur with white paws and belly.</p>
 <img itemprop="org.example.img" src="hedral.jpeg" alt="" title="Hedral, age 18 months">
</section>
In this example the "org.example.animals.cat" item has three properties, an "org.example.name" ("Hedral"), an "org.example.desc" ("Hedral is..."), and an "org.example.img" ("hedral.jpeg").

Quotes and examples (slightly personalised) come from the HTML5 working draft.

My reservations

My gut instinct with microdata is that it's overcomplicating things. We have RDFa already if you really want to get into the nitty-gritty of machine-readable data and, dare I say it, microformats and good semantic practice for creating shared vocabularies for plain-old semantic HTML. I'm not sure HTML5 necessarily needs this sort of extra solution.

The last example above, with the reverse DNS typing, just looks so... heavy. Something about it just doesn't feel right and it's actual value to me remains unclear, or at least I can't see the value of specifying the path on each element. Couldn't that be inferred from the structure, or subject used where ambiguities appear, and then as a last resort specify it on each element?


<section item="org.example.animal.cat">
 <h1 itemprop="name">Hedral</h1>
 <p itemprop="desc">Hedral is a male american domestic
 shorthair, with a fluffy black fur with white paws and belly.</p>
 <img itemprop="img" src="hedral.jpeg" alt="" title="Hedral, age 18 months">
</section>

The itemprop attribute bothers me most. I can't help but think that all the examples shown in the draft would still work if itemprop was replaced with class. The class attribute is already designed to take a semantically rich term for the element. Worse still, assuming class is used appropriately, you'll end up with unnecessary repetition across the attributes.


<div item>
 <p>My name is <span class="name" itemprop="name">Frances</span>.</p>
...
</div>

The subject attribute examples aren't great, which doesn't help their case - they don't seem that real world (although there are plenty of good reasons why you might need subject - just look at the microformat include-pattern for example, and how that would be improved with it). A few of the examples could be better represented and relationships then inferred from the element structure (and I wouldn't mind, but HTML5 already offers a boat-load of new elements to take away much of the ambiguity that HTML4 had - but just sections and headers go a long way to tying information notionally together).

The microdata proposal seems to be about making explicit what could otherwise already be inferred from the actual elements and values (although I'll concede that it's often inaccurate or very difficult). Wanting to be exact isn't a terrible idea (it works really well for the for attribute, for example) and I do like disambiguation. I just don't think the current proposal really solves the right problems as it stands.

I do think that subject has the most legs of the new attributes, though, but surely it could be as simple as:


<div id="about">
<p>I'm Frances and I like to complain about things on the internet.</p>
</div>
...
<p subject="about">I own no cats. :(</p>

Let the subect do what for has done for label, but across all elements, tying wayward bits of information to an ID (or maybe simply use subject alone to tie pieces of information together - but then this starts to feel like a class job again).

Or an example with class in place of itemprop and using a pre-defined vocabulary:


<div id="vcard">
<p>I'm <span class="fn">Frances</span> and I like to complain about things on the internet.</p>
</div>
...
<p subject="vcard">I still own no cats. :( I do work for the <span class="company">BBC</span> though. </p>

My final concern, which actually could apply to HTML5 as a whole and is more of a general are we ready for this yet? thought, is that this is a lot for an author to consider. You look at the web as it stands now, and most of it isn't well written. Elements are abused, misused or completely forgotten (and attributes fair worse).

HTML5 offers a raft of new elements and attributes to aid clarity in information, accessibility and flexibility. Do we really think that authors on the whole have a great track-record of implementing the specs well? These new microdata attributes make what could already be a simple lesson (use class meaningfully) into a much steeper learning curve, watering down the overall benefit.

I'm not suggesting that that should be an excuse to not make HTML5 as rich as possible, but it should always be in mind that the web is about enabling normal people to share information - it's not just an intellectual experiment for web developers.

Microdata is in the early draft stage - so I realise things will change.

Disclaimer

It's well known that I'm a microformats busy-body, but this has nothing to do with my distaste for microdata as the spec stands. Sure, the two things have similar aims, but microformats has always been a solution for the here-and-now. HTML5 still "supports" microformats, and when HTML5 is ready, microformats will simplify (using the time element can't happen soon enough) and continue to do what they have always done. I like HTML5 and want it to succeed. I am in no way condoning microformats over microdata or generally comparing the two.