Microformats and such

An entry published by James Bennett on June 29, 2008, Part of the categories Accessibility, Pedantics and Web standards. 12 comments posted.

I hope you’ll forgive this brief diversion from my ongoing attempt to distinguish web developers from web designers, but it’s late, I’ve had a couple beers and I’ve been tinkering a bit with some code. Regularly-scheduled programming will return shortly.

So. The microformats people and the accessibility people are at war with each other, or so it seems (remember to read that article with tongue firmly in cheek). The cause of this tempest in a teapot is a simple enough question: how do you embed both a “human friendly” and “machine friendly” representation of the same date/time in an HTML document, such that you follow the guiding principles of microformats without negatively impacting accessibility for, say, people whose screen readers will automatically expand and read out your clever use of the title attribute?

I’ve prepared a document which encapsulates one approach to solving this problem. This document could be from an application which keeps track of upcoming events and provides users with information about them, and as such it describes an event which will take place in the future.

Your mission, should you choose to accept it, is to write a program which:

  1. Retrieves this document from the above-listed URL.
  2. Parses the document to obtain the date and time on which the event will take place.
  3. Prints out that date and time in the standard format of your choice.

I can see at least two ways to do this, and I’ve already written a short Python program which implements one of them. It’s eleven lines of code, five of which are import statements and one of which retrieves the document, leaving only five lines to parse the document, retrieve the date/time and print it.

If you’d like to try this yourself before I explain how it works, stop scrolling down the page now. Come back and read the next section when you’re ready.

How it works

As I’ve mentioned, I can see at least two ways to get the necessary information out of the document. Both rely on a little-used feature of HTML: the scheme attribute of the meta element. The HTML 4.01 specification defines scheme as follows:

This attribute names a scheme to be used to interpret the property’s value (see the section on profiles for details).

Reading the section on profiles uncovers the following discussion of the use of scheme:

The scheme attribute allows authors to provide user agents more context for the correct interpretation of meta data. At times, such additional information may be critical, as when meta data may be specified in different formats. For example, an author might specify a date in the (ambiguous) format “10-9-97”; does this mean 9 October 1997 or 10 September 1997? The scheme attribute value “Month-Day-Year” would disambiguate this date value.

At other times, the scheme attribute may provide helpful but non-critical information to user agents.

It then goes on to give another example — using a value of “ISBN” for a meta element representing the ISBN of a book — as an illustration of the versatility of the scheme element, and delegates responsibility for defining schemes and their meanings to specific profiles used in HTML documents.

And if you look in my sample event-description document, you’ll find a couple of meta elements making use of this:

<meta name="date" scheme="RFC3339" content="2008-06-29T06:08:39-05:00">
<meta name="event-datetime" scheme="RFC3339" content="2008-07-06T12:00:00-05:00">

The first of these is a fairly standard use of meta; generally, “date” refers to the date and time on which the document was authored. The second has a name which I chose largely at random, but which can be assumed to stand in for a value which could be specified by, say, a microformat.

Both of them make use of the scheme attribute, and specify a value of “RFC3339”. RFC 3339 is document which defines a standard for representing timestamps on the Internet, based on the (broader in scope) ISO 8601 standard.

This suggests one way to solve the challenge above: look for a meta element with the name “event-datetime”, see that its scheme value is “RFC3339” and treat its content as an RFC 3339 timestamp. From there, any decent programming language will give you the necessary tools to parse the string “2008-07-06T12:00:00-05:00” into an object representing a date and time, and then reformat it however you like.

An alternative, and perhaps more interesting, way to handle this is to note that the document also contains this bit of HTML:

<span class="event-datetime">12:00 Sunday</span>

This is the “human-readable” version of the event date. Parsing this one is a bit trickier, but is still possible: you know, from the meta elements above, the date and time when the document was authored, and thus have a reliable anchor from which to calculate the relative date and time of “12:00 Sunday”. It’s not quite as simple to do this as to simply use the “event-datetime” meta element, but hey, if Remember the Milk can figure out what I mean from nothing more than the word “Sunday”, then a programmer armed with a day of the week, a time and a full base timestamp with time zone should be able to work this one out.

A starting point

Of course, this isn’t a perfect solution, or anything approaching it; it there are plenty of unanswered questions and unsolved problems lurking here (how do you handle multiple timestamps in the same document, for example?), and it probably isn’t a new idea. But it is the beginning of a possible solution, and it has some advantages over, say, using abbr or title to provide a “machine readable” version of a timestamp:

  1. It doesn’t fuck with screen readers.
  2. It uses a feature of HTML in a manner consistent with that feature’s documented purpose.
  3. In simple cases, it’s really easy to deal with.
  4. It opens up a way to gradually merge the ideas of “human readable” and “machine readable” timestamps, because those don’t necessarily have to be different things (without forcing people to learn the machine format).

There are probably lots of potential solutions which can do this with the same advantages; this WaSP article mentions a few. If you’ve got an idea for another, run it up the flagpole and see if anyone salutes.

On June 29, 2008, Simon Willison said:

Aah, but you’ve broken a cardinal rule of Microformats: “no invisible metadata”. I’m personally not convinced by the justifications for this rule (SEO types abused the meta element in the past because it wasn’t visible, and invisible metadata tends to get out of sync with visible information in hand edited pages) but the Microformats community is basically set on that one. http://microformats.org/wiki/invisible-data-considered-harmful

On June 29, 2008, PeterMHoward said:

I’m not sure the meta tag for “event-datetime” works in the microformats context, because you’re restricted to talking about the entire document. But I do like the idea of using the “date” one to specify a baseline date/time from which to interpret things like “Sunday”.

All told though, microformats and dates are pretty confusing — the way they’re specified currently makes things a little tricky for document authors, but un-ambiguous for a machine; going down your “sunday 12pm” path makes the authoring ridiculously easy, but the machine consumption much harder, with just a little too much room for interpretation (imagine a month’s worth of “sunday 12pm”s).

Maybe while we try and hammer out a middle ground HTML5 can solve the problem for us (I read that WaSP article and had been skimming the comments when suddenly I realised they were 13 months old, and we’re still having the same discussions!)

-p

On June 29, 2008, James Bennett said:

Aah, but you’ve broken a cardinal rule of Microformats: “no invisible metadata”.

I’m not sure the current abbr hacks really count as “visible”, though; are they considered “visible” simply because most user-agents style them (or allow authors to style them) and display a tooltip on hover? That’s awfully shaky ground (since “visibility” is then dependent on the behavior of prevailing user-agents), which is why I’ve never bought into it.

On June 29, 2008, Alex Holt said:

James is right, there’s nothing to stop the Mozilla guys reading this article, thinking that a meta element in the head is the BEST way to deal with dates and then adding a hover that associates the event-datetime element with the meta data to create a hover - bringing the “invisible” meta tag into the same realm as abbr or title.

On June 29, 2008, Jeremy Keith said:

Your problem statement says:

…how do you embed both a “human friendly” and “machine friendly” representation of the same date/time in an HTML document…

But hCalendar doesn’t map to a document. One document can (and often does) contain multiple events. So even if someone accepts your mission and succeeds, it doesn’t provide a viable alternative for publishers of more than one event per document.

If the challenge is restated as:

…how do you embed both multiple “human friendly” and “machine friendly” representations of the same dates/times in an HTML document…

…then the real scope of the issue is clearer.

I know that you acknowledged this when you said “how do you handle multiple timestamps in the same document, for example”, but I wanted to point out that this isn’t a trivial edge-case. Rather, it’s the default for most published hCalendar data (the BBC programme schedules being a good example).

On June 29, 2008, Andrew Ingram said:

My thought was that if a user agent sees anything in RFC3339 (or any other distinctive datetime format), it could take the task of translating it into the correct colloquial format for the user (ie the date would be read out as Month-Day-Year in the US or Day-Month-Year in the UK).

How hard can it be for the authors of JAWS to teach it to recognise RFC3339 dates? I’m not keen on abusing the semantics of abbr like hCalendar, but it seems that half the time we forget that screen-readers could get on board with supporting web standards (even de facto ones like microformats) just like everyone else.

On June 29, 2008, Andrew Ingram said:

Jeremy:

Supporting multiple timestamps is surely as simple as having multiple meta tags using different classes? Or you could do multiple timestamps within the content of the a single meta tag.

I don’t see the finer points of how this solution would work as being a particularly huge issue, the main issue is whether there are valid objections to the solution as a whole.

The main objection I can see is that people might want microformats to be self-contained blobs of semantics with no dependencies on information located outside of the microformat.

On June 29, 2008, Eivind Uggedal said:

Ruby solution:

%w[open-uri rubygems hpricot].each { |lib| require lib }

puts Time.xmlschema((Hpricot(open(ARGV[0]))%”meta[@name=’event-datetime’]”)[:content]).strftime ARGV[1]

Usage:

ruby microformat-event-detail.rb http://media.b-list.org/files/microformat-event-detail.html “%m-%d-%Y %I:%M%p”
On June 29, 2008, Marty Alchin said:

Personally, I like the idea of conveying date information in the form of a link. Most situations I can think of where hCalendar would be useful, there are already links to more events that occurred on that date, or at least having them would be beneficial anyway. Then, the human representation could be anything, and the “machine” representation could be extracted from the link.

All the examples I’ve seen have links that end in one of the two following forms:

It seems like specifying these two formats for dates would suitably “pave the cowpaths” and do the job of supporting multiple dates in a single document. In addition, it would provide the additional usability benefit of encouraging links to date archives for finding other events.

Of course, that doesn’t solve the issue of times, but those are much easier to parse out of a link’s text than dates, so that could be enough. I’m not sure yet how to deal with timezones, except to put those inside the link text as well, but that’s not great. Perhaps that could be a meta tag of something of the sort. It wouldn’t be data specific to any particular event, so it’s really not part of the microformat, but rather a hint as to how to interpret the time data for each event.

I dunno, I haven’t gotten too far into this microformat thing, but it seems like all this is getting far more complicated than it needs to be.

On June 29, 2008, Eddie Welker said:

Andrew,

it seems that half the time we forget that screen-readers could get > on board with supporting web standards (even de facto ones like microformats) just like everyone else.

I don’t argue that we shouldn’t try pushing screen-readers in that direction, but the situation is eerily similar to the fact that we have to design our html/css/js around poor implementations. I certainly wouldn’t get my hopes up any time soon.

On July 12, 2008, James Wheare said:

I think you missed a trick by not addressing the issue of multiple events in one document. You could even have marked the event up using the rest of hCal’s less contentious aspects.

Here’s a fully hCal marked up example using your meta date-time idea showing multiple events and relying on ids (also marked up as hAtom for bonus points):

http://james.wheare.org/microformats/meta-datetime.html

On July 12, 2008, James Wheare said:

Also, as to Simon’s point about invisible metadata, in this case the metadata is an alternate form of visible data, so I’m not sure if Tantek’s objections hold.

Comments for this entry are closed. If you'd like to share your thoughts on this entry with me, please contact me directly.