Does Atom Validate?

Fri, 05 Nov 2004 04:55:26 GMT

Recently, I wrote the schematron schema to validate Atom version 0.3. I wrote the rules to conform as closely as I could to the actual Atom 0.3 specification. I finished the schematron earlier this week and immediate set out to find how valid the existing Atom feeds are.

Blog	Really Simple Validation	FeedValidator
Intertwingly (Sam Ruby)	Invalid	Invalid
evhead (Evan Williams)	Invalid	Invalid
Six Apart	Valid	Valid
The Real Geek on Blogger (me)	Valid	Valid
.Conform (Philippe Janvier)	Valid	Valid
.Conform Blogmarks	Invalid	Valid
Salad w/ Steve (Steve Jenson)	Valid	Valid
Atom Enabled	Invalid	Invalid

That should be enough to prove my point. About half of the Atom feeds are invalid. I should also point out that the FeedValidator was incorrect more often than not, pointing out validation issues that didn't exist in the spec and missing other validation issues present in the spec. Why is it so difficult to create a valid Atom feed? The problem is that Atom is simply too complex. Examples of this complexity follow:

Some date timezones are optional and others are required.
Some date timezones must be UTC and others can be any timezone.
Relative URL w/ xml:base.
The content constructs type and mode attributes.
The default for type and mode are text/plain and xml, which makes for confusion when both defaults are selected.
How do you interpret an entity-encoded (or CDATA) string with a type of text/html and a mode of xml?
How do you interpret an entity-encoded (or CDATA) string with a type of text/plain and a mode of xml?
There are simply too many optional elements to choose from.

Update: Here's another example of why Atom is complex. This fragment is from Kevin Mark's feed, blogger extraordinaire at Technorati.