Category Archives: Microformats

Schema-org, microformats and more science please

A normal conversation in the GovUK (or any office I frequent) today went*: “Can we get some microformats on that page?”, I suggest as I spot a section of our site outputting a boat-load of addresses. “No problem – but what’s this about schema-org?”. “Yeah, yeah.. we can hedge our bets and throw their mark-up in there too, it’s just some extra itemprops. *flippant scoff* I’ll send you a complete snippet example, because I’m just nice like that.”

And that’s what I did. And it looked like this:


<div class="vcard" itemscope itemtype="http://schema.org/Organization">
  <p class="org" itemprop="name">Department for Transport</p>
  <p class="adr" itemprop="address" itemscope 

itemtype="http://schema.org/PostalAddress"> <span itemprop="streetAddress"> <span class="extended-address">Great Minster House</span> <span class="street-address">76 Marsham Street</span> </span> <span class="locality" itemprop="addressLocality">London</span> <span class="postcode" itemprop="postalCode">SW1P 4DR</span> </p> <p>Telephone: <span class="tel"
itemprop="telephone">0300 330 3000</span></p> <p>Website: <a href="http://www.dft.gov.uk" class="url"
itemprop="url">www.dft.gov.uk</a></p> <p>Email: <a
href="mailto:firstname.surname@dft.gsi.gov.uk"
class="email" itemprop="email">firstname.surname@dft.gsi.gov.uk</a></p> </div>

Holy massive-code-snippet batman. I was surprised by the size. I know, I can feel people digging up links already on attack and defence of “bloat” when using microformats alone, but seriously guys, IT’S HUGE. I felt guilty saying “this is what you’ve gotta add to get this mark-up to mean something“. Here’s a more broken down comparison:

Here’s the address, raw, at just over a tweet’s worth (167 chars):


Department for Transport
Great Minster House
76 Marsham Street
SW1P 4DR
Telephone: 0300 330 3000
Website: http://www.dft.gov.uk
Email: firstname.surname@dft.gsi.gov.uk

Here’s the address with the elements on it to get at the separate pieces of the address, bringing us up to 356:


<p>Department for Transport</p>
<p>
  <span>Great Minster House</span>
  <span>76 Marsham Street</span>
  <span>London</span>
  <span>SW1P 4DR</span>
</p>

<p>Telephone: 0300 330 3000</p>
<p>Website: <a href="http://www.dft.gov.uk">www.dft.gov.uk</a></p>
<p>Email: <a href="mailto:firstname.surname@dft.gsi.gov.uk" 
>firstname.surname@dft.gsi.gov.uk</a></p>

Now let’s throw some classes on to those and get a bit of meaning in there (I mean, you may want to style them up, get things on new lines etc etc. so using the microformat classes are handy for that alone.**). We’ve got a vCard, people! (565):


<div class="vcard">
  <p class="org">Department for Transport</p>
  <p class="adr">
    <span class="extended-address">Great Minster House</span>
    <span class="street-address">76 Marsham Street</span>

    <span class="locality">London</span>
    <span class="postcode">SW1P 4DR</span>
  </p>

    <p>Telephone: <span 

class="tel">0300 330 3000</span></p> <p>Website: <a href="http://www.dft.gov.uk"
class="url">www.dft.gov.uk</a></p> <p>Email: <a
href="mailto:firstname.surname@dft.gsi.gov.uk"
class="email>firstname.surname@dft.gsi.gov.uk</a></p> </div>

And now let’s make it schema-org friendly using microdata (863):


<div class="vcard" itemscope itemtype="http://schema.org/Organization">
  <p class="org" itemprop="name">Department for Transport</p>
  <p class="adr" itemprop="address" itemscope 

itemtype="http://schema.org/PostalAddress"> <span itemprop="streetAddress"> <span class="extended-address">Great Minster House</span> <span class="street-address">76 Marsham Street</span> </span> <span class="locality" itemprop="addressLocality">London</span> <span class="postcode" itemprop="postalCode">SW1P 4DR</span> </p> <p>Telephone: <span class="tel"
itemprop="telephone">0300 330 3000</span></p> <p>Website: <a href="http://www.dft.gov.uk" class="url"
itemprop="url">www.dft.gov.uk</a></p> <p>Email: <a
href="mailto:firstname.surname@dft.gsi.gov.uk"
class="email" itemprop="email">firstname.surname@dft.gsi.gov.uk</a></p> </div>

And we’re done. All I wanted to do was say “this, dear Computer, is an address”. Just getting some frankly useless out-of-the-box HTML elements on the raw data more than doubles it’s size (167 to 356), then we double it again to actually make it useful.

Now, I know size isn’t everything, and this is a pedantic, slightly silly, and probably less than accurate example. We’re not crazy obsessed with keeping our pages below a certain size anymore (Ah… I remember back when the BBC S&Gs insisted that every page had to be less than 200k down the wire including script and CSS AND images. Those were the days.), but it’s not something to be sniffed at either. Particularly with mark-up. Increased size probably suggests increased complexity – more work for everyone, more chance of someone bungling the order or nesting, more simply “I can’t be bothered”. Colour me dubious. I just want to highlight how much we add on to HTML to make it actually do what we need.

Itemscope and itemtype, a brief diversion

I had one of those Am I crazy, but why are there two properties on these things? moments. When would you ever use one without the other? The spec says you can use itemscope alone, but without itemtype, it’s a bit meaningless. I think I’d do away with itemscope and have itemtype only but with a value, either a URI or something meaningful to the internal vocabulary. itemscope seems to exist solely to say “the things in side me are related”, but by the very nature of it being the parent of those items, that’s already implied, and with a class name of something meaningful (say, hcard), or just the itemtype (with a useful value), explicit to data consumers.

This isn’t sarcasm: I would gratefully receive an explanation as to why there are two attributes instead of one.

Back in the room: Is this seriously what we expect authors to do?

I think I’m still struggling to understand why microdata is a separate specification (or even exists if it’s not being used as a mechanism to get stuff into HTML long-term). You can achieve exactly this richness with the current attributes supplied in HTML, and I don’t even mean just the microformats class way. The data- attribute is pretty handy, though, and seems ripe for stuffing with machine data (why shouldn’t it take a URI if you really need it?).

But I digress.

Microdata with schema-org is solving a problem we’ve already solved in microformats, but in an equally not-quite-there way (having to specify itemtype with a URI more than once in a page for items that are the same, but not within the same parent, feels filthy, for example). They are just as bad as each other, in slightly varying ways. Useful for proving a point, allowing growth and putting out examples (not that all of these bonuses are currently being made the best of), but crappy if this is all we can muster for the long-term, high-volume, regularly published, data representation patterns in HTML. We’re asking authors to jump through hoops still for things they shouldn’t have to.

Microformats, schema-org, whatever… is this really our game plan now? Just keep throwing ever more bloat into already creaking elements when you just want to do something really common? What’s the strategy for getting this stuff out of this mess and into the language?

You might be asking why bother aiming to get those stronger patterns into HTML, if this mechanism basically works for getting a machine to figure out what the hell you’re trying to say, but you may as well be asking why you have any semantically meaningful elements in HTML at all if that’s the case. HTML version 5 is redefining some elements to have better semantic meaning because HTML is the language of authors, and to authors and consumers meaning matters.

Without a plan for gathering evidence for popularly used patterns directly from microformats or microdata (and using them as formal methods of research, testing and development), or what people (actual, real developers – not just the big search engines) are doing in general, we’ll end up with no progress or the wrong progress in HTML, and I believe that a formal process for how and when this happens should be made (i.e. definitions of what constitutes critical mass of common patterns, how the information should be gathered, how they will be proposed formally in the WG and promoted into the language proper, etc.).

I want evidence-based HTML that will evolve using clearly defined mechanisms.

*Conversation shortened and re-written with an artistic license and possibly some (many; “nice” may be a stretch) inaccuracies.

**Yes, I’m casually suggesting that microformats are “free” if all you want to do is get your stuff out there with the minimum you’ll need to be machine-friendly and human-eyes-pretty.

Gold-plating the cow paths

I was quoted a couple of weeks ago as saying, albeit in private, the following:

“HTML fails to be simple if it can’t provide what authors regularly need and end up turning to other encodings” — @phae

@slightlylate

For context, that was in response to a remark made by a friend that HTML fails if authors can’t use it because it has become too complex and attempts to describe too much. My response was that it fails not because it’s complicated, but when an author cannot express their content accurately with the toolkit they’re supplied and have to go to another encoding to find what they’re looking for. That’s the language passing the buck, in my opinion.

Don’t get me wrong – I’m not suggesting HTML should cover every niche semantic everyone is ever going to want to express ever. That would be crazy and confusing. HTML should express what is most commonly used, and at the moment it doesn’t – which is why we still see microformats, microdata, component model, schema.org etc. trying to fill the gaps. And not just trying to fill the gaps, but trying to provide data on which decisions can be made about what should be in HTML.

HTML, and a platform that provides, should be the end goal. Microformats, et al., are the research grounds that should be directly contributing with the evidence and data they are able to garner. In fact, the most popular microformats, shown through demand and usage, should just be in HTML as a standard, by being provided for with semantically appropriate new elements.

We’ve seen this work. Microformats started doing things with dates, most specifically, hCalendar. It had a slightly cludgy way of marking up time, using abbr. The accessibility lot were rightfully less than impressed, and other patterns were tried – title and spans and all kinds of things. But in short, it was shown that time gets talked about a lot, and we needed something better. We got <time> in HTML. Hooray! The system works! Well, except when it doesn’t. Go read Bruce Lawson’s take, as the powers that be removed time and replaced it with data. Gee, thanks.

We shouldn’t expect authors to go in search of richer mark-up from other sources when what they’re trying to do is really common, when a need has been shown, and a pattern has been proven.

SXSWi 2011 Microformats panel

It is that time of year again: SXSWi panel pimpage! I’ve put together a somewhat vague panel proposal on behalf of microformats.org and I would appreciate it if you could give it a vote.

Apparently voting only counts towards a relatively small percentage (30%) of whether or not it will be selected, but with 2346 proposals in the system, I suspect it counts a lot more than that.

The session is rather vaguely defined because I’m not really sure right now what’ll still be interesting in a few months. I also want to garner as many opinions from the community as they can about what they want to know more about, see speak or show off – so do make your voice heard in the comments.

SXSW submissions are a bit nuts, really.

The mega-conference happens in March every year. By the time you’re done clearing your credit card bill and the fuss on twitter has died down a few weeks after the event, it’s already time to submit proposals for the coming year with the deadline at the start of July.

That means you need to think about your proposal a good 9 or 10 months before the next event.

In my mind, it’s incredibly difficult to predict what will be a hot topic or really relevant 10 months down the line in an industry like ours. Things move incredibly quickly. I also find it very difficult to know what to vote for – I may find at the beginning of next year that actually, I really could have done with knowing more about The Latest Technique, but right now I don’t know what it is to vote for it.

I also worry that interesting topics that I don’t know about yet don’t have the community around it to rally support and get the votes. Inevitably, the topics that are most trendy or have the most well-known organisers/panelists will be the topics that get the most votes. They tend not to be the panels I’ve enjoyed the most, though. Unfortunately, it’s becoming increasingly hard to figure out which sessions are going to be great and which aren’t, since SXSW is just so big now – I think it has become quantity over quality. </ complain>

Anyway, not a lot I can do about that other than play along and attempt to include a session that I will attempt to put together at a level that I deem acceptable quality. I do want to see microformats.org have a representation there, so help me out, huh?

p.s. The spelling of the tag “microformats” as “micoformats” is not mine. It’s theirs. And I asked to have it corrected, but apparently their system doesn’t easily allow for that at the moment. WTF?