Film and Lit 2010

Films (at the cinema, in seen order):

  1. Daybreakers
  2. The Road
  3. A Single Man
  4. The Wolfman
  5. The Crazies
  6. Alice in Wonderland
  7. Ponyo
  8. Shutter Island
  9. Perrier's Bounty
  10. Kick Ass
  11. Psycho
  12. I Am Love
  13. Double Take
  14. Dogtooth
  15. Four Lions
  16. The White Ribbon
  17. The Bad Lieutenant
  18. Rec 2
  19. Inception
  20. Splice
  21. Down Terrace
  22. The Illustionist
  23. Mother
  24. Scott Pilgrim vs. The World
  25. The Maid
  26. Certified Copy
  27. Cyrus
  28. Winter's Bone
  29. Made in Dagenham
  30. Wall Street: Money Never Sleeps
  31. Let Me In
  32. The Light Thief
  33. Black Swan
  34. Womb
  35. Octubre
  36. Monsters
  37. We Are What We Are
  38. Tron: Legacy
  39. The Town
  40. Love and Other Drugs
  41. The Way Back

Again, they're OO5ed

Best

Again, super lucky to get to put some sci-fi in the top of my list. The Road is captivating, if in an entirely bleak, looking at your pets and wondering if you'd eat them in a crisis, kind of way. My favourite was Monsters. It felt like an antidote to all those silly explosion, chase driven, gun-ho monster movies. It's delicate and subtle, and looks frankly amazing and ultra-detailed, and it doesn't treat the viewer like a complete idiot. It's just lovely. I guess Inception is the big one for everyone else - I liked it a lot, but I've kind of forgotten it already.

I had anticipated Splice as being a winner this year, but it totally missed the mark - Womb turned out to be the much more interesting, in depth, film about cloning and genetics (and it's got Matt Smith in it - what's not to go crazy for?). Tron should have been a massive disappointment, but I managed to keep expectations deliriously low and came out entertained. The film I failed to not get excited about before I saw it was Scott Pilgrim, given the sentimental place that I hold the graphic novels in, and fortunately it is really excellent fun.

I also loved The Illusionist and Ponyo. The former is beautiful, and although it's French it doesn't matter - they hardly utter a word, and when they do it sounds like a Sim - it's totally carried by the perfect animation style. And Ponyo is just adorable (sing the Ponyo song!).

The big marmite film for 2011 is definitely going to be Black Swan. It's a ballet drama? Really? Yeah. It is entirely a must-see film. It's an intensely paced psychological thriller and the ballet bit really shouldn't put anyone off. It's probably one of the best crafted films I've seen this year, if not for a few years.

Also loved lots of others, particularly Down Terrace, Dogtooth and Winter's Bone - all share the commonality of being a bit bleak (or, actually, totally screwed up - don't watch Dogtooth with your family, okay?).

Worst

It's only when compiling this list that I'm reminded of all the complete movie mishaps I've suffered this year. Not least, Alice in Wonderland. I'm a massive fan of the story, as many people know, and I was a fool to even think that a new film would capture everything I love about it. Oh, so disappointed. I rated it more highly at the time than I feel about it now. Damn you, Burton.

Other let downs include a whole slew of films that have brilliant concepts, but they were just half-heartedly or plainly executed - The Crazies springs to mind, as does Four Lions (controversial, I know, but it's a bit meh, to be honest - Chris Morris has a long way to go before he's back in "paedophile dressed as a school" territory), We Are What We Are and Daybreakers.

Mostly this year, there has been some severely pretentious nonsense. I Am Love, Certified Copy, The Light Thief, Double Take and A Single Man - all fairly decent concepts, but unfortunately completely boring. I struggled to stay awake in a couple of those. Mostly designed as fodder for film reviewers to fawn over, but actually, totally ridiculous and unwatchable.

Books

I've given up any semblance of attempting to record what I read. I did, however, buy a 3rd edition Kindle 3G, which I love. Surprisingly. Digital books completely lack everything I love about a beaten-up old paperback, particularly the digging through a dusty bookshop and finding random left-overs of previous owner' lives (ticket stubs, receipts... postcards are my particular favourite), but the convenience and the form factor of this thing is amazing. It's also caused me to re-read or find a bunch of classics, for free from manybooks (released through Project Gutenberg), that I would otherwise never have given the time to. I've read the complete Sherlock Holmes adventures, almost all of Robert Louis Stevenson, a bunch of H.G.Wells, and all sorts of other odds and ends. Metamorphosis struck me as an instant favourite of the classic selection.

Of non-ebooks, I read Philip Pullman's newest book, The Good Man Jesus and the Scoundrel Christ, early in the year. Thoroughly disappointing. More hype than substance, in my mind, and felt a little like a cash-in on his controversial position (I enjoyed the His Dark Materials trilogy). I also read my usual fill of science fiction and re-read some favourites. I read Cormac McCarthy's The Road after seeing it at the start of the year - which is unusual, since I'll generally rush to read a book before I see the film - but it's pretty much identical. Definitely recommend it if you're lacking that stark, miserable, hopeless feeling at the beginning of your new year. :)

24ways 2010

So, I wrote a little article for this year's 24ways on documentation. It's based heavily on the processes we used to develop BBC Glow, so I hope someone finds it useful.

If you're feeling charitable, this year you can buy my article and the other brilliant 23 as an annual from Five Simple Steps: 24ways 2010 Annual, with the proceeds going to UNICEF. Yay!

SXSWi 2011 Microformats panel

It is that time of year again: SXSWi panel pimpage! I've put together a somewhat vague panel proposal on behalf of microformats.org and I would appreciate it if you could give it a vote.

Apparently voting only counts towards a relatively small percentage (30%) of whether or not it will be selected, but with 2346 proposals in the system, I suspect it counts a lot more than that.

The session is rather vaguely defined because I'm not really sure right now what'll still be interesting in a few months. I also want to garner as many opinions from the community as they can about what they want to know more about, see speak or show off - so do make your voice heard in the comments.

SXSW submissions are a bit nuts, really.

The mega-conference happens in March every year. By the time you're done clearing your credit card bill and the fuss on twitter has died down a few weeks after the event, it's already time to submit proposals for the coming year with the deadline at the start of July.

That means you need to think about your proposal a good 9 or 10 months before the next event.

In my mind, it's incredibly difficult to predict what will be a hot topic or really relevant 10 months down the line in an industry like ours. Things move incredibly quickly. I also find it very difficult to know what to vote for - I may find at the beginning of next year that actually, I really could have done with knowing more about The Latest Technique, but right now I don't know what it is to vote for it.

I also worry that interesting topics that I don't know about yet don't have the community around it to rally support and get the votes. Inevitably, the topics that are most trendy or have the most well-known organisers/panelists will be the topics that get the most votes. They tend not to be the panels I've enjoyed the most, though. Unfortunately, it's becoming increasingly hard to figure out which sessions are going to be great and which aren't, since SXSW is just so big now - I think it has become quantity over quality. </ complain>

Anyway, not a lot I can do about that other than play along and attempt to include a session that I will attempt to put together at a level that I deem acceptable quality. I do want to see microformats.org have a representation there, so help me out, huh?

p.s. The spelling of the tag "microformats" as "micoformats" is not mine. It's theirs. And I asked to have it corrected, but apparently their system doesn't easily allow for that at the moment. WTF?

hgroups and sub-titles

I realise that queries or concerns about HTML 5 elements should make their way onto the WHATWG mailing list, but I just wanted to get a few thoughts out on here about what I've spent far too long discussing at work recently. It's perfectly likely that I've totally got the wrong end of the proverbial, so this is just me trying to get my mind straight on why I feel something about this is unnatural and I welcome comments to help clarify or discuss.

So, hgroup, eh?

hgroup is one of the new elements featuring in the HTML 5 specification. It's purpose, quite simply, is to group two or more headings together into one block so that subheadings are treated differently and only the first heading becomes part of the document outline.

The hgroup element is typically used to group a set of one or more h1-h6 elements — to group, for example, a section title and an accompanying subtitle.

From the current HTML5 working draft

The WHATWG wiki has the following rationale for requiring the hgroup element:

The point of <hgroup> is to hide the subtitle from the outlining algorithm

WHATWG wiki

Over on HTML5 Doctor, John Allsopp appears to find fault with this element also and suggests that the requirement for hgroup is symptomatic of a flaw in the outlining algorithm. I can see his point, but I'm more concerned that it's a fundamentally inaccurate use of a heading.

In my mind, headings are designed to denote sections. At least, that's what they were used for in HTML 4. Things either went in a heading, because they denoted a new section of content, or they didn't. This is Frances the idealist speaking, I realise this, but still.

Let's say I had a new website about a children's story about monsters, and I wanted to title it "Monsters live under my bed", but it could also have a sub-title or strap-line. As an author, I either want my title to be "Monsters live under my bed. Where things go bump in the night" or I want it to be "Monsters live under my bed" and the next line is incidental and a supplementary strap-line and not something I would consider to be part of my title.

Currently, I might do any of the following:

<h1>Monsters live under my bed 
Where things go bump in the night
</h1>

Example wrapped for legibility, but my story title is the full text and is in a heading.

<h1>Monsters live under my bed 
<span>Where things go bump in the night</span>
</h1>

This one is mostly a stylistic example. The strapline needs to look like a strapline, so I've stuck a span around it (yeah, I know...), but fundamentally I'm still considering it to be part of the title. My story's name is the full text.

<h1>Monsters live under my bed</h1>
<p>Where things go bump in the night</p>

In this case, the title of my story is only "Monsters live under my bed" and because HTML 4 doesn't really offer a suitable element that I would consider "a sub header that isn't a new section of the document" I've stuck the sub-title text in a paragraph.

<h1>Monsters live under my bed</h1>
<h2>Where things go bump in the night</h2>

This one suggests that I have a title and then the first chapter beneath the title is "Where things go bump in the night". That second line is no longer the title of my kids story. The h2 would be a new indented item in an outline and would suggest that further within the document I may find more h2s and that I have stepped into the document by a level.

What HTML 5 says you would do is this if you want a sub-title/sub-heading is:

<hgroup>
<h1>Monsters live under my bed</h1>
<h2>Things that go bump in the night</h2>
</hgroup>

This has the effect of making that h2 not appear in the outline, since it will no longer create a new section. The outline now considers that the title of my story is again "Monsters live under my bed". Any content that comes after this would be within the section titled by the h1. The h2 doesn't count as the start of a new section (as it would if there was no hgroup wrapper). The contents of the h2 is considered a special non-sectioning-heading case, but it's still in a heading element. But if it's meant to be a heading, why isn't it in the h1? Gah!

I kind of have the feeling that what we should have at our disposal is something that looks more like the following, which allows for a heading and some sort of sub-title(s) (naming isn't my strong point, I've picked 'strapline' fairly arbitrarily, but essentially I imagine it as a non-heading sub-title of some nature - maybe even subheading?). It's not as if hgroup is allowed to hold anything other than headings anyway.

<h1>Monsters live under my bed</h1>
<strapline>Where things go bump in the night<strapline>

It satisfies my problem with using lower numbered headings for things you consider to either be associated as part of the first heading (or rather, supplementary to it) or not actually headings at all. If I want my full title to be all of the above, it can all go in the h1. If I don't consider the second line to be part of a heading, it gets to go in it's own non-heading supplementary titling element. The rationale quoted above specifically says "subtitle", although I noticed the current editor's draft for hgroup does mention "subheadings".

Do you follow my drift?

If we're in the business of having the opportunity to create new elements, can't we just create one that actually satisfies the requirement explicitly rather than sort of allow authors to do things that seem somehow hypocritical to the point of heading elements in most other contexts. I also realise that purist intentions fall waaaaay down the list of priorities when compared to the requirements of paving existing usage, but as an author as well, I feel that there's something fundamentally inaccurate about treating a heading as a non-heading. As an author I want to be able to be as accurate as possible.

Is it just time for me to let go of the idea that headings do the job of creating and naming sections in a document outline?

Aren't semantics fun!

Science Hack Day, Turing Tests and Google

For Science Hack Day, I have have been thinking about a topic that was of great interest to me whilst I was at university - artificial intelligence.

Science Hack Day hasn't actually happened yet, by the way. It's going on this weekend (19th & 20th June) at the Guardian offices, and there's still time to sign up if you're interested. This is an idea I was playing around with, but I probably won't be doing this at the weekend unless it piques someone else's (with more linguistic intellect) interest. Feel free to bug me if this is a topic to chat about.

The Turing Test

One of the basic concepts and experiments in the AI world is the now defunct, but intellectually and philosophically interesting, Turing Test. In the simplest terms, the test is around proving intelligence by showing human characteristics through dialogue and natural language, and this is shown through genuine human testers being blindly pitted against either another real human being, or a test program, and guessing as to whether their conversational partner is a human or not. Every year challengers from around the world still compete in this test, and produce complex computer programs that can converse with human beings and nearly fool them into believing they too are human. No one has created a program that can behave accurately, or more often randomly enough, to fool participants completely - which is why it remains an interesting, although essentially irrelevant, problem.

The reason this test is defunct as a gauge of intelligence is pretty obvious in hindsight. Being able to converse like a human being might show that whatever it is doing the conversing can take apart the constituent parts of a sentence and cobble them back together with some new information to fool a human, but it's not really showing other markers of intelligence - specifically the ability to think. And neither does an entity being unable to converse in this way preclude it from having intelligence - you just need to look around our own animal kingdom and see the wealth of intelligence shown in other organisms that have no verbal language. The 'Chinese Room' is the original thought experiment that describes this specific problem, which you should totally go and read about right now.

Now, I'm not for one moment suggesting that over 2 days (or 2 lifetimes) a person such as myself with no linguistics or complex algorithms training could create a program that could have a go at passing the Turing test and win the Loebner Prize, but I got to thinking about how people interact with the internet in such a way that maybe the Internet itself could be considered to have the capabilities, and the depth and range of knowledge, to show 'intelligence' as Turing would have defined it through this test.

Google as an intelligent conversationalist

Go to Google and ask it a question - even better, ask it a question and hit 'I'm feeling lucky'. Most of the time it produces an 'answer' that's pretty bloomin' accurate to what you're looking for. Take a sample of that page that possibly directly answers that question and cobble it into some pigeon English, and would that do as a conversational retort? Reckon it could have a stab at knowing the punchline to your rubbish 'Knock knock...' joke? I think it could.

In fact, from the Loebner Prize rules, the sample questions are all easily answerable by Google - the only thing it would struggle with is the memory part, but with Google's ever growing logging of what kind of information you search for, it's only a short way from that.

I was googling about trying to find other people who must have been thinking about using search engines for turing tests, and came across John Ferrara in 2008 discussing the user interaction benefits of using search in a way that would produce Turing test-ready results (I particularly like his accurate prediction that ontologies are the way forward - more on that later). Google is clearly doing some really interesting, and without doubt highly complex, things around parsing search terms and working out what the interesting parts of the query are. They're doing Natural Language Parsing, but just one way - the asker to the responder.

Natural Language Parsers

So, I started digging about on the web for a natural language parser to see if I could maybe package up Google results in one line retorts. In JavaScript. Mostly because I'm a client-side developer, but also because that seemed like a funny idea (one late night after a couple Amstels) and JS can be lightning fast in the right environment. Unsurprisingly - there wasn't one. I found this nice little 'parts of sentence' tagger that someone had ported from another project into JS, and this seemed like a good start, and there's OpenNLP - the open source hub for NLPs (mostly in Java, Perl and Python). Then Jake suggested I port one of the ones in Python to JS. Ah hah hah, where's that <sarcasm> element when you need it?

The highly complex NLP part is really only the dressing. It's the bit that does the fakery and really reacts and responds and produces pretend empathy and is essentially what people who are trying to win the Loebner Prize care about - to be honest, there's plenty of real people behind machines to talk to than we really need as it is on the internets, let alone adding a bunch of equally inane computer ones - so I'm not really that interested in that to any complex level - I just need something relatively simple.

I am interested in mining the ever growing source of richly marked up data and sources on the web, and presenting them back to a human being in a friendly, natural way. Basically, I want one of those slightly-sinister robot voices talking to me from my computer, as featured in all good sci-fis (maybe less Hal and more Gerty) who can cooly and calmly, for example, present me the probable likelihood of poisoning myself by eating out-of-date eggs or what factor suncream it might be wise to wear to the park tomorrow so that I don't burn to a crisp. An information supplier and sympathiser that's smarter than me and knows about more sources of information than I could and can save me a bit of time wading through google results.

Let's talk

So, on to my fuzzy notion of how this might work, just as a thought experiment at first and maybe a slightly naff proof of concept.

Blindly searching google for sensible responses from any old web page seems foolish. An awful lot of sites continue to be badly formed and unintelligible to machines. The obvious thing to do is restrict searches to sites with well-formed data - microformats and RDF seem like the obvious things to look for. This clearly poses a slight problem in that not all topics exist in well-formed data, but over time, that'll improve. To make this proof of concept easier, and one that I could feasibly think about building in a weekend, I'm therefore going to limit the topics of interest to data I know I can get at in a well-formed model.

Let's have a chat about food. I'm going to propose a fictional conversation that I want to create the responses to automatically.

Maybe we want to ask our machine:

Do you know any good vegetarian recipes?.

A good response might be:

Yes, I know 20582746 vegetarian recipes. Do you want to narrow it down a bit?

Yes, I'm looking for a good recipe for a feta and spinach tart.

I have a good recipe for that. Would you like me to give you a link to it, or just tell you the ingredients?

I want to stop there and illustrate a couple of interesting things about these sentences. Firstly, the word 'good'. How could a machine know if a recipe is good? Well, hRecipe allows for a recipe to receive a rating - the machine could use this to determine whether to describe the recipe it's found as 'good'. Likewise, I could have asked it 'What's the worst meal you've eaten?' and perhaps it trawls off for the first lowest rated recipe it can find and declares that its least favourite. Kind of makes me think that this machine person would need to be called Legion, because rather than having the opinion of an individual (or rather the opinion of the programmer), it has the crowd-sourced opinion of all web participants.

Great. Does it have tomatoes in it? I don't like tomatoes.

No. Would you like the recipe now?

Yes, what are the ingredients?

And so on... Having a program read back the parts of a well-formed recipe are really easy. Recipes marked as hRecipe clearly define each of the parts. You could ask it to read you step one of the method, or repeat step 3, or double check what temperature the oven needs to be at. To be honest, you could obviously be reading that directly yourself, but the act of marking up information like that makes it really easy to programmatically extra useful, relevant, information out of a webpage, strap it into some semblance of natural english, and read it out to a person in such a way that a person might believe that a human being was interpreting the page, which they could find more accessible. And that's the ticket, really. Google search results, or rather the elements derived from rich data snippets, become the lexicon element of the previously mentioned NLPs.

Limitations

What it probably couldn't do is tell you how it's feeling or where it lives - the sort of questions and topics that turn up in the logs for turing tests - but really, does it matter? It would probably also get confused really easily by badly formed pages and it would just as happily give you bad, irrelevant or plain gibberish responses sometimes - but all computers will do that - which is a greater reason to make pages as well-formed and parsable as possible.

Even if my notion of a simple friendly-face Google bot couldn't pass the Turing Test, I bet that if Alan Turing had still been alive at the advent of Google and Wolfram Alpha and the likes, he'd be bloody impressed and be pleased to know that he probably instigated some of it.

Which reminds me - June 2012 will celebrate Turing's 100th birthday - Pretty sure we'll need to have an extra special Science Hack Day for that too, don't you think?

Older Posts

Newer Posts