Gold-plating the cow paths

I was quoted a couple of weeks ago as saying, albeit in private, the following:

“HTML fails to be simple if it can’t provide what authors regularly need and end up turning to other encodings” — @phae

@slightlylate

For context, that was in response to a remark made by a friend that HTML fails if authors can’t use it because it has become too complex and attempts to describe too much. My response was that it fails not because it’s complicated, but when an author cannot express their content accurately with the toolkit they’re supplied and have to go to another encoding to find what they’re looking for. That’s the language passing the buck, in my opinion.

Don’t get me wrong – I’m not suggesting HTML should cover every niche semantic everyone is ever going to want to express ever. That would be crazy and confusing. HTML should express what is most commonly used, and at the moment it doesn’t – which is why we still see microformats, microdata, component model, schema.org etc. trying to fill the gaps. And not just trying to fill the gaps, but trying to provide data on which decisions can be made about what should be in HTML.

HTML, and a platform that provides, should be the end goal. Microformats, et al., are the research grounds that should be directly contributing with the evidence and data they are able to garner. In fact, the most popular microformats, shown through demand and usage, should just be in HTML as a standard, by being provided for with semantically appropriate new elements.

We’ve seen this work. Microformats started doing things with dates, most specifically, hCalendar. It had a slightly cludgy way of marking up time, using abbr. The accessibility lot were rightfully less than impressed, and other patterns were tried – title and spans and all kinds of things. But in short, it was shown that time gets talked about a lot, and we needed something better. We got <time> in HTML. Hooray! The system works! Well, except when it doesn’t. Go read Bruce Lawson’s take, as the powers that be removed time and replaced it with data. Gee, thanks.

We shouldn’t expect authors to go in search of richer mark-up from other sources when what they’re trying to do is really common, when a need has been shown, and a pattern has been proven.

hgroups and sub-titles

I realise that queries or concerns about HTML 5 elements should make their way onto the WHATWG mailing list, but I just wanted to get a few thoughts out on here about what I’ve spent far too long discussing at work recently. It’s perfectly likely that I’ve totally got the wrong end of the proverbial, so this is just me trying to get my mind straight on why I feel something about this is unnatural and I welcome comments to help clarify or discuss.

So, hgroup, eh?

hgroup is one of the new elements featuring in the HTML 5 specification. It’s purpose, quite simply, is to group two or more headings together into one block so that subheadings are treated differently and only the first heading becomes part of the document outline.

The hgroup element is typically used to group a set of one or more h1-h6 elements — to group, for example, a section title and an accompanying subtitle.

From the current HTML5 working draft

The WHATWG wiki has the following rationale for requiring the hgroup element:

The point of <hgroup> is to hide the subtitle from the outlining algorithm

WHATWG wiki

Over on HTML5 Doctor, John Allsopp appears to find fault with this element also and suggests that the requirement for hgroup is symptomatic of a flaw in the outlining algorithm. I can see his point, but I’m more concerned that it’s a fundamentally inaccurate use of a heading.

In my mind, headings are designed to denote sections. At least, that’s what they were used for in HTML 4. Things either went in a heading, because they denoted a new section of content, or they didn’t. This is Frances the idealist speaking, I realise this, but still.

Let’s say I had a new website about a children’s story about monsters, and I wanted to title it “Monsters live under my bed”, but it could also have a sub-title or strap-line. As an author, I either want my title to be “Monsters live under my bed. Where things go bump in the night” or I want it to be “Monsters live under my bed” and the next line is incidental and a supplementary strap-line and not something I would consider to be part of my title.

Currently, I might do any of the following:

<h1>Monsters live under my bed 
Where things go bump in the night
</h1>

Example wrapped for legibility, but my story title is the full text and is in a heading.

<h1>Monsters live under my bed 
<span>Where things go bump in the night</span>
</h1>

This one is mostly a stylistic example. The strapline needs to look like a strapline, so I’ve stuck a span around it (yeah, I know…), but fundamentally I’m still considering it to be part of the title. My story’s name is the full text.

<h1>Monsters live under my bed</h1>
<p>Where things go bump in the night</p>

In this case, the title of my story is only “Monsters live under my bed” and because HTML 4 doesn’t really offer a suitable element that I would consider “a sub header that isn’t a new section of the document” I’ve stuck the sub-title text in a paragraph.

<h1>Monsters live under my bed</h1>
<h2>Where things go bump in the night</h2>

This one suggests that I have a title and then the first chapter beneath the title is “Where things go bump in the night”. That second line is no longer the title of my kids story. The h2 would be a new indented item in an outline and would suggest that further within the document I may find more h2s and that I have stepped into the document by a level.

What HTML 5 says you would do is this if you want a sub-title/sub-heading is:

<hgroup>
<h1>Monsters live under my bed</h1>
<h2>Things that go bump in the night</h2>
</hgroup>

This has the effect of making that h2 not appear in the outline, since it will no longer create a new section. The outline now considers that the title of my story is again “Monsters live under my bed”. Any content that comes after this would be within the section titled by the h1. The h2 doesn’t count as the start of a new section (as it would if there was no hgroup wrapper). The contents of the h2 is considered a special non-sectioning-heading case, but it’s still in a heading element. But if it’s meant to be a heading, why isn’t it in the h1? Gah!

I kind of have the feeling that what we should have at our disposal is something that looks more like the following, which allows for a heading and some sort of sub-title(s) (naming isn’t my strong point, I’ve picked ‘strapline’ fairly arbitrarily, but essentially I imagine it as a non-heading sub-title of some nature – maybe even subheading?). It’s not as if hgroup is allowed to hold anything other than headings anyway.

<h1>Monsters live under my bed</h1>
<strapline>Where things go bump in the night<strapline>

It satisfies my problem with using lower numbered headings for things you consider to either be associated as part of the first heading (or rather, supplementary to it) or not actually headings at all. If I want my full title to be all of the above, it can all go in the h1. If I don’t consider the second line to be part of a heading, it gets to go in it’s own non-heading supplementary titling element. The rationale quoted above specifically says “subtitle”, although I noticed the current editor’s draft for hgroup does mention “subheadings”.

Do you follow my drift?

If we’re in the business of having the opportunity to create new elements, can’t we just create one that actually satisfies the requirement explicitly rather than sort of allow authors to do things that seem somehow hypocritical to the point of heading elements in most other contexts. I also realise that purist intentions fall waaaaay down the list of priorities when compared to the requirements of paving existing usage, but as an author as well, I feel that there’s something fundamentally inaccurate about treating a heading as a non-heading. As an author I want to be able to be as accurate as possible.

Is it just time for me to let go of the idea that headings do the job of creating and naming sections in a document outline?

Aren’t semantics fun!

HTML5 Microdata – Over-cooked?

What is Microdata?

Microdata is HTML5’s answer to how we should go about embedding machine-readable data in our mark-up.

At a high level, microdata consists of a group of name-value pairs. The groups are called items, and each name-value pair is a property. Items and properties are represented by regular elements.

A simple example looks something like this:


<div item>
 <p>My name is <span itemprop="name">Frances</span>.</p>
 <p>My work for the <span itemprop="company">BBC</span>.</p>
 <p>I am <span itemprop="nationality">British</span>.</p>
</div>

Where the item has 3 properties with values (name:Frances, company:BBC, nationality:British).

You can then associate item properties with items that the property is not a direct descendant of, with the subject attribute.

Essentially, you have some new attributes at your disposal:

  • item – to specify a group.
  • itemprop – to define the property of an element inside an item.
  • subject – to associate a property with a non-parent item.

You can also type items with a URL, reverse DNS labels or a pre-defined type (and each itemprop can accept multiple properties, as you’d expect with class):

Here, the item is “org.example.animals.cat”:


<section item="org.example.animal.cat">
 <h1 itemprop="org.example.name">Hedral</h1>
 <p itemprop="org.example.desc">Hedral is a male american domestic
 shorthair, with a fluffy black fur with white paws and belly.</p>
 <img itemprop="org.example.img" src="hedral.jpeg" alt="" title="Hedral, age 18 months">
</section>

In this example the “org.example.animals.cat” item has three properties, an “org.example.name” (“Hedral”), an “org.example.desc” (“Hedral is…”), and an “org.example.img” (“hedral.jpeg”).

Quotes and examples (slightly personalised) come from the HTML5 working draft.

My reservations

My gut instinct with microdata is that it’s overcomplicating things. We have RDFa already if you really want to get into the nitty-gritty of machine-readable data and, dare I say it, microformats and good semantic practice for creating shared vocabularies for plain-old semantic HTML. I’m not sure HTML5 necessarily needs this sort of extra solution.

The last example above, with the reverse DNS typing, just looks so… heavy. Something about it just doesn’t feel right and it’s actual value to me remains unclear, or at least I can’t see the value of specifying the path on each element. Couldn’t that be inferred from the structure, or subject used where ambiguities appear, and then as a last resort specify it on each element?


<section item="org.example.animal.cat">
 <h1 itemprop="name">Hedral</h1>
 <p itemprop="desc">Hedral is a male american domestic
 shorthair, with a fluffy black fur with white paws and belly.</p>
 <img itemprop="img" src="hedral.jpeg" alt="" title="Hedral, age 18 months">
</section>

The itemprop attribute bothers me most. I can’t help but think that all the examples shown in the draft would still work if itemprop was replaced with class. The class attribute is already designed to take a semantically rich term for the element. Worse still, assuming class is used appropriately, you’ll end up with unnecessary repetition across the attributes.


<div item>
 <p>My name is <span class="name" itemprop="name">Frances</span>.</p>
...
</div>

The subject attribute examples aren’t great, which doesn’t help their case – they don’t seem that real world (although there are plenty of good reasons why you might need subject – just look at the microformat include-pattern for example, and how that would be improved with it). A few of the examples could be better represented and relationships then inferred from the element structure (and I wouldn’t mind, but HTML5 already offers a boat-load of new elements to take away much of the ambiguity that HTML4 had – but just sections and headers go a long way to tying information notionally together).

The microdata proposal seems to be about making explicit what could otherwise already be inferred from the actual elements and values (although I’ll concede that it’s often inaccurate or very difficult). Wanting to be exact isn’t a terrible idea (it works really well for the for attribute, for example) and I do like disambiguation. I just don’t think the current proposal really solves the right problems as it stands.

I do think that subject has the most legs of the new attributes, though, but surely it could be as simple as:


<div id="about">
<p>I'm Frances and I like to complain about things on the internet.</p>
</div>
...
<p subject="about">I own no cats. :(</p>

Let the subect do what for has done for label, but across all elements, tying wayward bits of information to an ID (or maybe simply use subject alone to tie pieces of information together – but then this starts to feel like a class job again).

Or an example with class in place of itemprop and using a pre-defined vocabulary:


<div id="vcard">
<p>I'm <span class="fn">Frances</span> and I like to complain about things on the internet.</p>
</div>
...
<p subject="vcard">I still own no cats. :( I do work for the <span class="company">BBC</span> though. </p>

My final concern, which actually could apply to HTML5 as a whole and is more of a general are we ready for this yet? thought, is that this is a lot for an author to consider. You look at the web as it stands now, and most of it isn’t well written. Elements are abused, misused or completely forgotten (and attributes fair worse).

HTML5 offers a raft of new elements and attributes to aid clarity in information, accessibility and flexibility. Do we really think that authors on the whole have a great track-record of implementing the specs well? These new microdata attributes make what could already be a simple lesson (use class meaningfully) into a much steeper learning curve, watering down the overall benefit.

I’m not suggesting that that should be an excuse to not make HTML5 as rich as possible, but it should always be in mind that the web is about enabling normal people to share information – it’s not just an intellectual experiment for web developers.

Microdata is in the early draft stage – so I realise things will change.

Disclaimer

It’s well known that I’m a microformats busy-body, but this has nothing to do with my distaste for microdata as the spec stands. Sure, the two things have similar aims, but microformats has always been a solution for the here-and-now. HTML5 still “supports” microformats, and when HTML5 is ready, microformats will simplify (using the time element can’t happen soon enough) and continue to do what they have always done. I like HTML5 and want it to succeed. I am in no way condoning microformats over microdata or generally comparing the two.

Adding XFN

I recently readded XFN tags back into my links (read: blogroll), which are another Microformat open standard. As with most microformats it’s very simple, and some blogs will do it for you by default. What it basically means is you add rel=”relationship” to the link of the person to give the link some additional meaning.

For example, if I wanted to link to my friend Lana, I can write:

<a href=”http://lanadenise.wordpress.com” rel=”met friend”>Lana’s blog</a>

This indicates that Lana is a friend who I have met. If you leave out the “met” it can be a friend you haven’t yet met (i.e. online). There’s a handful of predefined relationships that should be used but there’s just enough. You can indicate family members, co-workers and vague connections.

Why would you bother, I hear you ask? Well, it gives some extra meaning to my markup for one. You know how I love semantics. But after badgering my Dad onto WordPress so I’d have a legitimate reason to use a family XFN tag, we discussed some of the awesome things about it (which had also been mentioned on #microformats). For example, my Dad has a website because he’s interested in finding, and being found by, distant relatives. Imagine a few years down the line when everyone has a blog (don’t they already?) and use XFN tags on the links to their other family members with blogs. You could easily pull up a diagram based on these interconnected links and see who is related to who. An instant family tree!

Alternatively, you could look up people who work together, or instantly pull up a group’s social network based on reciprocated links. Also, it means I can tie other websites that I use to this page, providing they all show the rel=”me” which will ultimately end here. (See Identity consolidation with the XFN rel=”me” value.)

So, I added that, and after spotting that I had accidentally misspelt his surname and telling me that I should blog this, Tantek suggested I also hCard the links. Not a bad idea! So now you can grab my friends names, websites and what they mean to me all in one go.

Apart from my inability to spell some names correctly, XFN is a very simple to add but fully loaded tag for links, so I had no issues with implementing them.

I think it’s something that will be relied upon more and more in the future for a range of uses and services, so it’s really worth adding now and getting a grip on. Mixing XFN with VoteLinks (which I have yet to use anywhere) and no-follow seem like an interesting prospect and perhaps could be useful for better determining page ranking or just aiding web searches. I’m no innovator, but I’m sure someone will come up with a good way to utilise these features together.

The question is, what should I format next? hResume?