01 Dec

Schema-org, microformats and more science please

A normal conversation in the GovUK (or any office I frequent) today went*: “Can we get some microformats on that page?”, I suggest as I spot a section of our site outputting a boat-load of addresses. “No problem – but what’s this about schema-org?”. “Yeah, yeah.. we can hedge our bets and throw their mark-up in there too, it’s just some extra itemprops. *flippant scoff* I’ll send you a complete snippet example, because I’m just nice like that.”

And that’s what I did. And it looked like this:

<div class="vcard" itemscope itemtype="http://schema.org/Organization">
  <p class="org" itemprop="name">Department for Transport</p>
  <p class="adr" itemprop="address" itemscope 

itemtype="http://schema.org/PostalAddress"> <span itemprop="streetAddress"> <span class="extended-address">Great Minster House</span> <span class="street-address">76 Marsham Street</span> </span> <span class="locality" itemprop="addressLocality">London</span> <span class="postcode" itemprop="postalCode">SW1P 4DR</span> </p> <p>Telephone: <span class="tel"
itemprop="telephone">0300 330 3000</span></p> <p>Website: <a href="http://www.dft.gov.uk" class="url"
itemprop="url">www.dft.gov.uk</a></p> <p>Email: <a
class="email" itemprop="email">firstname.surname@dft.gsi.gov.uk</a></p> </div>

Holy massive-code-snippet batman. I was surprised by the size. I know, I can feel people digging up links already on attack and defence of “bloat” when using microformats alone, but seriously guys, IT’S HUGE. I felt guilty saying “this is what you’ve gotta add to get this mark-up to mean something“. Here’s a more broken down comparison:

Here’s the address, raw, at just over a tweet’s worth (167 chars):

Department for Transport
Great Minster House
76 Marsham Street
Telephone: 0300 330 3000
Website: http://www.dft.gov.uk
Email: firstname.surname@dft.gsi.gov.uk

Here’s the address with the elements on it to get at the separate pieces of the address, bringing us up to 356:

<p>Department for Transport</p>
  <span>Great Minster House</span>
  <span>76 Marsham Street</span>
  <span>SW1P 4DR</span>

<p>Telephone: 0300 330 3000</p>
<p>Website: <a href="http://www.dft.gov.uk">www.dft.gov.uk</a></p>
<p>Email: <a href="mailto:firstname.surname@dft.gsi.gov.uk" 

Now let’s throw some classes on to those and get a bit of meaning in there (I mean, you may want to style them up, get things on new lines etc etc. so using the microformat classes are handy for that alone.**). We’ve got a vCard, people! (565):

<div class="vcard">
  <p class="org">Department for Transport</p>
  <p class="adr">
    <span class="extended-address">Great Minster House</span>
    <span class="street-address">76 Marsham Street</span>

    <span class="locality">London</span>
    <span class="postcode">SW1P 4DR</span>

    <p>Telephone: <span 

class="tel">0300 330 3000</span></p> <p>Website: <a href="http://www.dft.gov.uk"
class="url">www.dft.gov.uk</a></p> <p>Email: <a
class="email>firstname.surname@dft.gsi.gov.uk</a></p> </div>

And now let’s make it schema-org friendly using microdata (863):

<div class="vcard" itemscope itemtype="http://schema.org/Organization">
  <p class="org" itemprop="name">Department for Transport</p>
  <p class="adr" itemprop="address" itemscope 

itemtype="http://schema.org/PostalAddress"> <span itemprop="streetAddress"> <span class="extended-address">Great Minster House</span> <span class="street-address">76 Marsham Street</span> </span> <span class="locality" itemprop="addressLocality">London</span> <span class="postcode" itemprop="postalCode">SW1P 4DR</span> </p> <p>Telephone: <span class="tel"
itemprop="telephone">0300 330 3000</span></p> <p>Website: <a href="http://www.dft.gov.uk" class="url"
itemprop="url">www.dft.gov.uk</a></p> <p>Email: <a
class="email" itemprop="email">firstname.surname@dft.gsi.gov.uk</a></p> </div>

And we’re done. All I wanted to do was say “this, dear Computer, is an address”. Just getting some frankly useless out-of-the-box HTML elements on the raw data more than doubles it’s size (167 to 356), then we double it again to actually make it useful.

Now, I know size isn’t everything, and this is a pedantic, slightly silly, and probably less than accurate example. We’re not crazy obsessed with keeping our pages below a certain size anymore (Ah… I remember back when the BBC S&Gs insisted that every page had to be less than 200k down the wire including script and CSS AND images. Those were the days.), but it’s not something to be sniffed at either. Particularly with mark-up. Increased size probably suggests increased complexity – more work for everyone, more chance of someone bungling the order or nesting, more simply “I can’t be bothered”. Colour me dubious. I just want to highlight how much we add on to HTML to make it actually do what we need.

Itemscope and itemtype, a brief diversion

I had one of those Am I crazy, but why are there two properties on these things? moments. When would you ever use one without the other? The spec says you can use itemscope alone, but without itemtype, it’s a bit meaningless. I think I’d do away with itemscope and have itemtype only but with a value, either a URI or something meaningful to the internal vocabulary. itemscope seems to exist solely to say “the things in side me are related”, but by the very nature of it being the parent of those items, that’s already implied, and with a class name of something meaningful (say, hcard), or just the itemtype (with a useful value), explicit to data consumers.

This isn’t sarcasm: I would gratefully receive an explanation as to why there are two attributes instead of one.

Back in the room: Is this seriously what we expect authors to do?

I think I’m still struggling to understand why microdata is a separate specification (or even exists if it’s not being used as a mechanism to get stuff into HTML long-term). You can achieve exactly this richness with the current attributes supplied in HTML, and I don’t even mean just the microformats class way. The data- attribute is pretty handy, though, and seems ripe for stuffing with machine data (why shouldn’t it take a URI if you really need it?).

But I digress.

Microdata with schema-org is solving a problem we’ve already solved in microformats, but in an equally not-quite-there way (having to specify itemtype with a URI more than once in a page for items that are the same, but not within the same parent, feels filthy, for example). They are just as bad as each other, in slightly varying ways. Useful for proving a point, allowing growth and putting out examples (not that all of these bonuses are currently being made the best of), but crappy if this is all we can muster for the long-term, high-volume, regularly published, data representation patterns in HTML. We’re asking authors to jump through hoops still for things they shouldn’t have to.

Microformats, schema-org, whatever… is this really our game plan now? Just keep throwing ever more bloat into already creaking elements when you just want to do something really common? What’s the strategy for getting this stuff out of this mess and into the language?

You might be asking why bother aiming to get those stronger patterns into HTML, if this mechanism basically works for getting a machine to figure out what the hell you’re trying to say, but you may as well be asking why you have any semantically meaningful elements in HTML at all if that’s the case. HTML version 5 is redefining some elements to have better semantic meaning because HTML is the language of authors, and to authors and consumers meaning matters.

Without a plan for gathering evidence for popularly used patterns directly from microformats or microdata (and using them as formal methods of research, testing and development), or what people (actual, real developers – not just the big search engines) are doing in general, we’ll end up with no progress or the wrong progress in HTML, and I believe that a formal process for how and when this happens should be made (i.e. definitions of what constitutes critical mass of common patterns, how the information should be gathered, how they will be proposed formally in the WG and promoted into the language proper, etc.).

I want evidence-based HTML that will evolve using clearly defined mechanisms.

*Conversation shortened and re-written with an artistic license and possibly some (many; “nice” may be a stretch) inaccuracies.

**Yes, I’m casually suggesting that microformats are “free” if all you want to do is get your stuff out there with the minimum you’ll need to be machine-friendly and human-eyes-pretty.

31 Oct

Gold-plating the cow paths

I was quoted a couple of weeks ago as saying, albeit in private, the following:

“HTML fails to be simple if it can’t provide what authors regularly need and end up turning to other encodings” — @phae


For context, that was in response to a remark made by a friend that HTML fails if authors can’t use it because it has become too complex and attempts to describe too much. My response was that it fails not because it’s complicated, but when an author cannot express their content accurately with the toolkit they’re supplied and have to go to another encoding to find what they’re looking for. That’s the language passing the buck, in my opinion.

Don’t get me wrong – I’m not suggesting HTML should cover every niche semantic everyone is ever going to want to express ever. That would be crazy and confusing. HTML should express what is most commonly used, and at the moment it doesn’t – which is why we still see microformats, microdata, component model, schema.org etc. trying to fill the gaps. And not just trying to fill the gaps, but trying to provide data on which decisions can be made about what should be in HTML.

HTML, and a platform that provides, should be the end goal. Microformats, et al., are the research grounds that should be directly contributing with the evidence and data they are able to garner. In fact, the most popular microformats, shown through demand and usage, should just be in HTML as a standard, by being provided for with semantically appropriate new elements.

We’ve seen this work. Microformats started doing things with dates, most specifically, hCalendar. It had a slightly cludgy way of marking up time, using abbr. The accessibility lot were rightfully less than impressed, and other patterns were tried – title and spans and all kinds of things. But in short, it was shown that time gets talked about a lot, and we needed something better. We got <time> in HTML. Hooray! The system works! Well, except when it doesn’t. Go read Bruce Lawson’s take, as the powers that be removed time and replaced it with data. Gee, thanks.

We shouldn’t expect authors to go in search of richer mark-up from other sources when what they’re trying to do is really common, when a need has been shown, and a pattern has been proven.

12 Aug

SXSWi 2011 Microformats panel

It is that time of year again: SXSWi panel pimpage! I’ve put together a somewhat vague panel proposal on behalf of microformats.org and I would appreciate it if you could give it a vote.

Apparently voting only counts towards a relatively small percentage (30%) of whether or not it will be selected, but with 2346 proposals in the system, I suspect it counts a lot more than that.

The session is rather vaguely defined because I’m not really sure right now what’ll still be interesting in a few months. I also want to garner as many opinions from the community as they can about what they want to know more about, see speak or show off – so do make your voice heard in the comments.

SXSW submissions are a bit nuts, really.

The mega-conference happens in March every year. By the time you’re done clearing your credit card bill and the fuss on twitter has died down a few weeks after the event, it’s already time to submit proposals for the coming year with the deadline at the start of July.

That means you need to think about your proposal a good 9 or 10 months before the next event.

In my mind, it’s incredibly difficult to predict what will be a hot topic or really relevant 10 months down the line in an industry like ours. Things move incredibly quickly. I also find it very difficult to know what to vote for – I may find at the beginning of next year that actually, I really could have done with knowing more about The Latest Technique, but right now I don’t know what it is to vote for it.

I also worry that interesting topics that I don’t know about yet don’t have the community around it to rally support and get the votes. Inevitably, the topics that are most trendy or have the most well-known organisers/panelists will be the topics that get the most votes. They tend not to be the panels I’ve enjoyed the most, though. Unfortunately, it’s becoming increasingly hard to figure out which sessions are going to be great and which aren’t, since SXSW is just so big now – I think it has become quantity over quality. </ complain>

Anyway, not a lot I can do about that other than play along and attempt to include a session that I will attempt to put together at a level that I deem acceptable quality. I do want to see microformats.org have a representation there, so help me out, huh?

p.s. The spelling of the tag “microformats” as “micoformats” is not mine. It’s theirs. And I asked to have it corrected, but apparently their system doesn’t easily allow for that at the moment. WTF?

16 Jun

Science Hack Day, Turing Tests and Google

For Science Hack Day, I have have been thinking about a topic that was of great interest to me whilst I was at university – artificial intelligence.

Science Hack Day hasn’t actually happened yet, by the way. It’s going on this weekend (19th & 20th June) at the Guardian offices, and there’s still time to sign up if you’re interested. This is an idea I was playing around with, but I probably won’t be doing this at the weekend unless it piques someone else’s (with more linguistic intellect) interest. Feel free to bug me if this is a topic to chat about.

The Turing Test

One of the basic concepts and experiments in the AI world is the now defunct, but intellectually and philosophically interesting, Turing Test. In the simplest terms, the test is around proving intelligence by showing human characteristics through dialogue and natural language, and this is shown through genuine human testers being blindly pitted against either another real human being, or a test program, and guessing as to whether their conversational partner is a human or not. Every year challengers from around the world still compete in this test, and produce complex computer programs that can converse with human beings and nearly fool them into believing they too are human. No one has created a program that can behave accurately, or more often randomly enough, to fool participants completely – which is why it remains an interesting, although essentially irrelevant, problem.

The reason this test is defunct as a gauge of intelligence is pretty obvious in hindsight. Being able to converse like a human being might show that whatever it is doing the conversing can take apart the constituent parts of a sentence and cobble them back together with some new information to fool a human, but it’s not really showing other markers of intelligence – specifically the ability to think. And neither does an entity being unable to converse in this way preclude it from having intelligence – you just need to look around our own animal kingdom and see the wealth of intelligence shown in other organisms that have no verbal language. The ‘Chinese Room‘ is the original thought experiment that describes this specific problem, which you should totally go and read about right now.

Now, I’m not for one moment suggesting that over 2 days (or 2 lifetimes) a person such as myself with no linguistics or complex algorithms training could create a program that could have a go at passing the Turing test and win the Loebner Prize, but I got to thinking about how people interact with the internet in such a way that maybe the Internet itself could be considered to have the capabilities, and the depth and range of knowledge, to show ‘intelligence’ as Turing would have defined it through this test.

Google as an intelligent conversationalist

Go to Google and ask it a question – even better, ask it a question and hit ‘I’m feeling lucky’. Most of the time it produces an ‘answer’ that’s pretty bloomin’ accurate to what you’re looking for. Take a sample of that page that possibly directly answers that question and cobble it into some pigeon English, and would that do as a conversational retort? Reckon it could have a stab at knowing the punchline to your rubbish ‘Knock knock…’ joke? I think it could.

In fact, from the Loebner Prize rules, the sample questions are all easily answerable by Google – the only thing it would struggle with is the memory part, but with Google’s ever growing logging of what kind of information you search for, it’s only a short way from that.

I was googling about trying to find other people who must have been thinking about using search engines for turing tests, and came across John Ferrara in 2008 discussing the user interaction benefits of using search in a way that would produce Turing test-ready results (I particularly like his accurate prediction that ontologies are the way forward – more on that later). Google is clearly doing some really interesting, and without doubt highly complex, things around parsing search terms and working out what the interesting parts of the query are. They’re doing Natural Language Parsing, but just one way – the asker to the responder.

Natural Language Parsers

So, I started digging about on the web for a natural language parser to see if I could maybe package up Google results in one line retorts. In JavaScript. Mostly because I’m a client-side developer, but also because that seemed like a funny idea (one late night after a couple Amstels) and JS can be lightning fast in the right environment. Unsurprisingly – there wasn’t one. I found this nice little ‘parts of sentence’ tagger that someone had ported from another project into JS, and this seemed like a good start, and there’s OpenNLP – the open source hub for NLPs (mostly in Java, Perl and Python). Then Jake suggested I port one of the ones in Python to JS. Ah hah hah, where’s that <sarcasm> element when you need it?

The highly complex NLP part is really only the dressing. It’s the bit that does the fakery and really reacts and responds and produces pretend empathy and is essentially what people who are trying to win the Loebner Prize care about – to be honest, there’s plenty of real people behind machines to talk to than we really need as it is on the internets, let alone adding a bunch of equally inane computer ones – so I’m not really that interested in that to any complex level – I just need something relatively simple.

I am interested in mining the ever growing source of richly marked up data and sources on the web, and presenting them back to a human being in a friendly, natural way. Basically, I want one of those slightly-sinister robot voices talking to me from my computer, as featured in all good sci-fis (maybe less Hal and more Gerty) who can cooly and calmly, for example, present me the probable likelihood of poisoning myself by eating out-of-date eggs or what factor suncream it might be wise to wear to the park tomorrow so that I don’t burn to a crisp. An information supplier and sympathiser that’s smarter than me and knows about more sources of information than I could and can save me a bit of time wading through google results.

Let’s talk

So, on to my fuzzy notion of how this might work, just as a thought experiment at first and maybe a slightly naff proof of concept.

Blindly searching google for sensible responses from any old web page seems foolish. An awful lot of sites continue to be badly formed and unintelligible to machines. The obvious thing to do is restrict searches to sites with well-formed data – microformats and RDF seem like the obvious things to look for. This clearly poses a slight problem in that not all topics exist in well-formed data, but over time, that’ll improve. To make this proof of concept easier, and one that I could feasibly think about building in a weekend, I’m therefore going to limit the topics of interest to data I know I can get at in a well-formed model.

Let’s have a chat about food. I’m going to propose a fictional conversation that I want to create the responses to automatically.

Maybe we want to ask our machine:

Do you know any good vegetarian recipes?.

A good response might be:

Yes, I know 20582746 vegetarian recipes. Do you want to narrow it down a bit?

Yes, I’m looking for a good recipe for a feta and spinach tart.

I have a good recipe for that. Would you like me to give you a link to it, or just tell you the ingredients?

I want to stop there and illustrate a couple of interesting things about these sentences. Firstly, the word ‘good’. How could a machine know if a recipe is good? Well, hRecipe allows for a recipe to receive a rating – the machine could use this to determine whether to describe the recipe it’s found as ‘good’. Likewise, I could have asked it ‘What’s the worst meal you’ve eaten?’ and perhaps it trawls off for the first lowest rated recipe it can find and declares that its least favourite. Kind of makes me think that this machine person would need to be called Legion, because rather than having the opinion of an individual (or rather the opinion of the programmer), it has the crowd-sourced opinion of all web participants.

Great. Does it have tomatoes in it? I don’t like tomatoes.

No. Would you like the recipe now?

Yes, what are the ingredients?

And so on… Having a program read back the parts of a well-formed recipe are really easy. Recipes marked as hRecipe clearly define each of the parts. You could ask it to read you step one of the method, or repeat step 3, or double check what temperature the oven needs to be at. To be honest, you could obviously be reading that directly yourself, but the act of marking up information like that makes it really easy to programmatically extra useful, relevant, information out of a webpage, strap it into some semblance of natural english, and read it out to a person in such a way that a person might believe that a human being was interpreting the page, which they could find more accessible. And that’s the ticket, really. Google search results, or rather the elements derived from rich data snippets, become the lexicon element of the previously mentioned NLPs.


What it probably couldn’t do is tell you how it’s feeling or where it lives – the sort of questions and topics that turn up in the logs for turing tests – but really, does it matter? It would probably also get confused really easily by badly formed pages and it would just as happily give you bad, irrelevant or plain gibberish responses sometimes – but all computers will do that – which is a greater reason to make pages as well-formed and parsable as possible.

Even if my notion of a simple friendly-face Google bot couldn’t pass the Turing Test, I bet that if Alan Turing had still been alive at the advent of Google and Wolfram Alpha and the likes, he’d be bloody impressed and be pleased to know that he probably instigated some of it.

Which reminds me – June 2012 will celebrate Turing’s 100th birthday – Pretty sure we’ll need to have an extra special Science Hack Day for that too, don’t you think?

24 May

HTML5 Microdata – Over-cooked?

What is Microdata?

Microdata is HTML5’s answer to how we should go about embedding machine-readable data in our mark-up.

At a high level, microdata consists of a group of name-value pairs. The groups are called items, and each name-value pair is a property. Items and properties are represented by regular elements.

A simple example looks something like this:

<div item>
 <p>My name is <span itemprop="name">Frances</span>.</p>
 <p>My work for the <span itemprop="company">BBC</span>.</p>
 <p>I am <span itemprop="nationality">British</span>.</p>

Where the item has 3 properties with values (name:Frances, company:BBC, nationality:British).

You can then associate item properties with items that the property is not a direct descendant of, with the subject attribute.

Essentially, you have some new attributes at your disposal:

  • item – to specify a group.
  • itemprop – to define the property of an element inside an item.
  • subject – to associate a property with a non-parent item.

You can also type items with a URL, reverse DNS labels or a pre-defined type (and each itemprop can accept multiple properties, as you’d expect with class):

Here, the item is “org.example.animals.cat”:

<section item="org.example.animal.cat">
 <h1 itemprop="org.example.name">Hedral</h1>
 <p itemprop="org.example.desc">Hedral is a male american domestic
 shorthair, with a fluffy black fur with white paws and belly.</p>
 <img itemprop="org.example.img" src="hedral.jpeg" alt="" title="Hedral, age 18 months">

In this example the “org.example.animals.cat” item has three properties, an “org.example.name” (“Hedral”), an “org.example.desc” (“Hedral is…”), and an “org.example.img” (“hedral.jpeg”).

Quotes and examples (slightly personalised) come from the HTML5 working draft.

My reservations

My gut instinct with microdata is that it’s overcomplicating things. We have RDFa already if you really want to get into the nitty-gritty of machine-readable data and, dare I say it, microformats and good semantic practice for creating shared vocabularies for plain-old semantic HTML. I’m not sure HTML5 necessarily needs this sort of extra solution.

The last example above, with the reverse DNS typing, just looks so… heavy. Something about it just doesn’t feel right and it’s actual value to me remains unclear, or at least I can’t see the value of specifying the path on each element. Couldn’t that be inferred from the structure, or subject used where ambiguities appear, and then as a last resort specify it on each element?

<section item="org.example.animal.cat">
 <h1 itemprop="name">Hedral</h1>
 <p itemprop="desc">Hedral is a male american domestic
 shorthair, with a fluffy black fur with white paws and belly.</p>
 <img itemprop="img" src="hedral.jpeg" alt="" title="Hedral, age 18 months">

The itemprop attribute bothers me most. I can’t help but think that all the examples shown in the draft would still work if itemprop was replaced with class. The class attribute is already designed to take a semantically rich term for the element. Worse still, assuming class is used appropriately, you’ll end up with unnecessary repetition across the attributes.

<div item>
 <p>My name is <span class="name" itemprop="name">Frances</span>.</p>

The subject attribute examples aren’t great, which doesn’t help their case – they don’t seem that real world (although there are plenty of good reasons why you might need subject – just look at the microformat include-pattern for example, and how that would be improved with it). A few of the examples could be better represented and relationships then inferred from the element structure (and I wouldn’t mind, but HTML5 already offers a boat-load of new elements to take away much of the ambiguity that HTML4 had – but just sections and headers go a long way to tying information notionally together).

The microdata proposal seems to be about making explicit what could otherwise already be inferred from the actual elements and values (although I’ll concede that it’s often inaccurate or very difficult). Wanting to be exact isn’t a terrible idea (it works really well for the for attribute, for example) and I do like disambiguation. I just don’t think the current proposal really solves the right problems as it stands.

I do think that subject has the most legs of the new attributes, though, but surely it could be as simple as:

<div id="about">
<p>I'm Frances and I like to complain about things on the internet.</p>
<p subject="about">I own no cats. :(</p>

Let the subect do what for has done for label, but across all elements, tying wayward bits of information to an ID (or maybe simply use subject alone to tie pieces of information together – but then this starts to feel like a class job again).

Or an example with class in place of itemprop and using a pre-defined vocabulary:

<div id="vcard">
<p>I'm <span class="fn">Frances</span> and I like to complain about things on the internet.</p>
<p subject="vcard">I still own no cats. :( I do work for the <span class="company">BBC</span> though. </p>

My final concern, which actually could apply to HTML5 as a whole and is more of a general are we ready for this yet? thought, is that this is a lot for an author to consider. You look at the web as it stands now, and most of it isn’t well written. Elements are abused, misused or completely forgotten (and attributes fair worse).

HTML5 offers a raft of new elements and attributes to aid clarity in information, accessibility and flexibility. Do we really think that authors on the whole have a great track-record of implementing the specs well? These new microdata attributes make what could already be a simple lesson (use class meaningfully) into a much steeper learning curve, watering down the overall benefit.

I’m not suggesting that that should be an excuse to not make HTML5 as rich as possible, but it should always be in mind that the web is about enabling normal people to share information – it’s not just an intellectual experiment for web developers.

Microdata is in the early draft stage – so I realise things will change.


It’s well known that I’m a microformats busy-body, but this has nothing to do with my distaste for microdata as the spec stands. Sure, the two things have similar aims, but microformats has always been a solution for the here-and-now. HTML5 still “supports” microformats, and when HTML5 is ready, microformats will simplify (using the time element can’t happen soon enough) and continue to do what they have always done. I like HTML5 and want it to succeed. I am in no way condoning microformats over microdata or generally comparing the two.

07 Jul

Microformats, the BBC and friends

I recently had the job of letting the microformats community know that the BBC were having to drop hCalendar due to accessibility concerns surrounding the use of abbr and the date-time pattern.

My friend and colleague Jake Archibald published a summary of what’s happened so far, what the current alternative suggestions are and the BBC’s take on them. It’s a useful read if you want to catch-up and see where we are.

I think the best thing to come out of this is probably that we’re talking about actual alternatives again rather than just waiting for more evidence (which often feels like a get-out clause for inaction). Whether we’re doing that right though… well, we’ll see. I appreciate the apprehension that comes with changing something that’s already had some seal of “yep.. good to go.. use it!” – no one wants to get this “wrong” again. Equally though, I really do hope we can come to a compromise and “solve” the problem this time. Extending HTML 4 was ever going to be especially pretty, but bear with us (please don’t mention HTML 5 to me – it’s for your own good).

On lighter notes, here’s a couple interesting microformatty things:

And lastly, I wanted to mention that I should hopefully have details on the next London Microformats vEvent very soon.

Based on feedback from the last event we held during London Web Week, Drew and I are planning a “Getting Started” event, with back-to-basics semantics and microformats implementations.

16 May

The BBC needs you!

Are you a screen reader user, or know someone who is? Want to contribute to making the Beeb a more accessible place?

The BBC is looking for people to let them know what screen reader users hear when they visit the new Programmes pages, which just happen to contain the ever controversial abbreviation design pattern contained within the hCalendar microformat, or whether they expand and listen to title attributes and abbreviations at all.

Please pop on over to the BBC RadioLabs blog article and leave your feedback or get in touch if you would think you can help test!

03 Apr

Microformats vEvent and London Web Week

I mentioned at the start of the year that we were planning to have another “Microformat vEvent” in the first quarter… well, slightly later than planned I’m pleased to announce that we’re good to go and you can now sign up!


The event has been delayed so that we could take part in a new grander event which is London Web Week. It’s going to be a solid week of all things webby, and includes other such highlights as @media London, BarCampLondon 4, a Web Standards Group event and a new one-day conference aimed at new comers who are just interested in or starting out in web development and design, called Web Roots. Even Pub Standards is sneaking in on the act (keep an eye on upcoming for the “The Great Pub Standards Heresy“).

The full schedule of events is available here and I expect it’ll expand to contain a few of the London user groups for various web… things… over the next few weeks.

So, back to the point of my post. Microformats vEvent!

The good news is, I’ve managed to twist the arms of a couple of nice folks to do some speaking for us. We’ve got Dan Brickley and Tom Morris. Surprisingly, both usually more aligned with the RDF camp rather than microformats – but I’m personally up for breaking down that wall (and I hope they are too) and seeing if we can’t all “get along”. So, with that in mind, they will each be taking on topics that look at microformats working along side other semantic web technologies in complementary ways.

Full details on what these guys will be talking about are again, on the sign-up page, as well where and when (The Yorkshire Grey Pub, Holborn, Tuesday 27th May, 7pm) you need to show up. Make sure you sign-up quickly though – we’ve only got a limited amount of space, and entrance is with ticket only.

18 Feb

SemanticCamp London

I went to one day* of SemanticCamp this weekend at Imperial College in London. It was really enjoyable and it was great to see so many people show up and take part. Kudos to Tom Morris and Daniel John Lewis for their organisational skills.

Ben Ward and I represented microformats.org and did a presentation-ish chat and Q&A session entitled “Microformats: State of the Nation” and covered recent happenings in the microformat world and some things we might hope to see this year.

So, when I say “presentation-ish” what we did was chat about a list of things that we’d thought about during the morning, presented via a quick list from my email inbox, via my new Asus EEE (which is still super cute, and is the black 4GB surf before you ask). Yeah, we’re professional. To sum up what we covered though, here’s an elaborated version of said list:

  • Current:
    • Big Deployments
    • Kelkoo listings (hListing, soon), hCards
    • Google Social Graph (XFN and FOAF indexing)
  • Parsers increasing:
  • New Formats:
  • Accessibility:
  • Future:
    • Distributed Social Networking
    • Google Social Graph API
    • Distributed identity with OpenID
    • Distributed contact lists
    • Build on URLs
    • Consolidate identity URLs using XFN
    • hCard providing context and detail
    • XFN describing social relationships

We had lots of questions and it was actually great. One thing that came up a couple of times in a few conversations I had was a desperate need for a full test suite for microformats. Unfortunately, such things are hard to get done because the work is time consuming and not especially interesting or rewarding. I wonder if anyone has experience on the best way to get a full test set written for all current formats?

Our session overran, but thanks for coming by if you did and as always, feel free to come by and join the mailing list(s) and get involved.

* Day two I did not make it to due to late night Brighton fun for Andy Budd‘s flat-warming. Thanks very much for putting us up for the night, Jeremy! Apologies to SemanticCampers who wanted to play with the EEE some more.

12 Dec

London Microformats vEvent

It was the end of 2006 when Drew Mclellan and I threw the last microformat event in London and now it’s almost 2008 and we haven’t had another. Cryin’ shame, I say.

So, we’re going to hold another event. We’re ironing out the details, but the bones of it will be a mostly social event in London in the early part of 2008, with a couple of interesting people talking about the latest microformat and semantic web related things and some beer thrown in for good measure.

To help us out though, we’d really appreciated it if you could register your interest in such an event on the microformats.org wiki. The page you need to visit is here: http://microformats.org/wiki/events/2008-london-microformats-vevent We’ll also be filling up that page with the details as they happen, so do keep an eye on it.

Note: If you don’t fancy signing up to the wiki, don’t worry about it. Drop a comment here and I’ll add you.