HTML5 Microdata – Over-cooked?

What is Microdata?

Microdata is HTML5’s answer to how we should go about embedding machine-readable data in our mark-up.

At a high level, microdata consists of a group of name-value pairs. The groups are called items, and each name-value pair is a property. Items and properties are represented by regular elements.

A simple example looks something like this:

<div item>
 <p>My name is <span itemprop="name">Frances</span>.</p>
 <p>My work for the <span itemprop="company">BBC</span>.</p>
 <p>I am <span itemprop="nationality">British</span>.</p>

Where the item has 3 properties with values (name:Frances, company:BBC, nationality:British).

You can then associate item properties with items that the property is not a direct descendant of, with the subject attribute.

Essentially, you have some new attributes at your disposal:

  • item – to specify a group.
  • itemprop – to define the property of an element inside an item.
  • subject – to associate a property with a non-parent item.

You can also type items with a URL, reverse DNS labels or a pre-defined type (and each itemprop can accept multiple properties, as you’d expect with class):

Here, the item is “”:

<section item="">
 <h1 itemprop="">Hedral</h1>
 <p itemprop="org.example.desc">Hedral is a male american domestic
 shorthair, with a fluffy black fur with white paws and belly.</p>
 <img itemprop="org.example.img" src="hedral.jpeg" alt="" title="Hedral, age 18 months">

In this example the “” item has three properties, an “” (“Hedral”), an “org.example.desc” (“Hedral is…”), and an “org.example.img” (“hedral.jpeg”).

Quotes and examples (slightly personalised) come from the HTML5 working draft.

My reservations

My gut instinct with microdata is that it’s overcomplicating things. We have RDFa already if you really want to get into the nitty-gritty of machine-readable data and, dare I say it, microformats and good semantic practice for creating shared vocabularies for plain-old semantic HTML. I’m not sure HTML5 necessarily needs this sort of extra solution.

The last example above, with the reverse DNS typing, just looks so… heavy. Something about it just doesn’t feel right and it’s actual value to me remains unclear, or at least I can’t see the value of specifying the path on each element. Couldn’t that be inferred from the structure, or subject used where ambiguities appear, and then as a last resort specify it on each element?

<section item="">
 <h1 itemprop="name">Hedral</h1>
 <p itemprop="desc">Hedral is a male american domestic
 shorthair, with a fluffy black fur with white paws and belly.</p>
 <img itemprop="img" src="hedral.jpeg" alt="" title="Hedral, age 18 months">

The itemprop attribute bothers me most. I can’t help but think that all the examples shown in the draft would still work if itemprop was replaced with class. The class attribute is already designed to take a semantically rich term for the element. Worse still, assuming class is used appropriately, you’ll end up with unnecessary repetition across the attributes.

<div item>
 <p>My name is <span class="name" itemprop="name">Frances</span>.</p>

The subject attribute examples aren’t great, which doesn’t help their case – they don’t seem that real world (although there are plenty of good reasons why you might need subject – just look at the microformat include-pattern for example, and how that would be improved with it). A few of the examples could be better represented and relationships then inferred from the element structure (and I wouldn’t mind, but HTML5 already offers a boat-load of new elements to take away much of the ambiguity that HTML4 had – but just sections and headers go a long way to tying information notionally together).

The microdata proposal seems to be about making explicit what could otherwise already be inferred from the actual elements and values (although I’ll concede that it’s often inaccurate or very difficult). Wanting to be exact isn’t a terrible idea (it works really well for the for attribute, for example) and I do like disambiguation. I just don’t think the current proposal really solves the right problems as it stands.

I do think that subject has the most legs of the new attributes, though, but surely it could be as simple as:

<div id="about">
<p>I'm Frances and I like to complain about things on the internet.</p>
<p subject="about">I own no cats. :(</p>

Let the subect do what for has done for label, but across all elements, tying wayward bits of information to an ID (or maybe simply use subject alone to tie pieces of information together – but then this starts to feel like a class job again).

Or an example with class in place of itemprop and using a pre-defined vocabulary:

<div id="vcard">
<p>I'm <span class="fn">Frances</span> and I like to complain about things on the internet.</p>
<p subject="vcard">I still own no cats. :( I do work for the <span class="company">BBC</span> though. </p>

My final concern, which actually could apply to HTML5 as a whole and is more of a general are we ready for this yet? thought, is that this is a lot for an author to consider. You look at the web as it stands now, and most of it isn’t well written. Elements are abused, misused or completely forgotten (and attributes fair worse).

HTML5 offers a raft of new elements and attributes to aid clarity in information, accessibility and flexibility. Do we really think that authors on the whole have a great track-record of implementing the specs well? These new microdata attributes make what could already be a simple lesson (use class meaningfully) into a much steeper learning curve, watering down the overall benefit.

I’m not suggesting that that should be an excuse to not make HTML5 as rich as possible, but it should always be in mind that the web is about enabling normal people to share information – it’s not just an intellectual experiment for web developers.

Microdata is in the early draft stage – so I realise things will change.


It’s well known that I’m a microformats busy-body, but this has nothing to do with my distaste for microdata as the spec stands. Sure, the two things have similar aims, but microformats has always been a solution for the here-and-now. HTML5 still “supports” microformats, and when HTML5 is ready, microformats will simplify (using the time element can’t happen soon enough) and continue to do what they have always done. I like HTML5 and want it to succeed. I am in no way condoning microformats over microdata or generally comparing the two.

24 thoughts on “HTML5 Microdata – Over-cooked?”

  1. Pingback: termin_feeds
  2. Pingback: Masataka Yakura
  3. Pingback: Masataka Yakura
  4. Pingback: Paul Hempsall
  5. Pingback: Moses Ngone
  6. Pingback: Paul Goodwin
  7. Pingback: Sindre Wimberger
  8. I agree that there is much stuff in HTML5 of questionable relevance and much material which is arguably duplicating functionality we can achieve with microformats and RDF.

  9. Pingback: ielite
  10. “My final concern, which actually could apply to HTML5 as a whole and is more of a general are we ready for this yet?”

    Possibly not, but HTML5 in that sense, is not about ‘now’, but about where we will be in something like five years time.
    Sure, early-adopters have started using HTML5 already, but main-stream use (by authors) is going to have to wait for main-stream acceptance by browsers, and that isn’t now.

  11. Please don’t hesitate to send your feedback to one of the lists — the addresses are in the top of the spec in the “status of this document” section, or you can mail me directly at:

  12. @Mike – browsers are already implementing a lot, so I don’t think anyone has to do much in the way of waiting.

    @Ian – Thanks. I will try to. This was really just my instant gut reaction rather than a proper critique. I’m going to have a bit of a play about (I’m writing a HTML5 version of my current theme at the moment) and actually see how it feels and then get back to the list.

    I have also since found your post to the mailing list about how microdata came about and was designed, which was fascinating in itself.

  13. Pingback: progg
  14. Pingback: Glenn Jones
  15. Pingback: Robert Visser
  16. Pingback: HTML5watcher
  17. Pingback: Masayuki Ando
  18. Pingback: Evil Jim O'Donnell
  19. Pingback: Brian Barthold
  20. Pingback: Jabsco Bilge Pumps

Comments are closed.