Recently, I wrote the schematron schema to validate Atom version 0.3. I wrote the rules to conform as closely as I could to the actual Atom 0.3 specification. I finished the schematron earlier this week and immediate set out to find how valid the existing Atom feeds are.
Blog | Really Simple Validation | FeedValidator |
---|---|---|
Intertwingly (Sam Ruby) | ||
evhead (Evan Williams) | ||
Six Apart | ||
The Real Geek on Blogger (me) | ||
.Conform (Philippe Janvier) | ||
.Conform Blogmarks | ||
Salad w/ Steve (Steve Jenson) | ||
Atom Enabled |
That should be enough to prove my point. About half of the Atom feeds are invalid. I should also point out that the FeedValidator was incorrect more often than not, pointing out validation issues that didn't exist in the spec and missing other validation issues present in the spec. Why is it so difficult to create a valid Atom feed? The problem is that Atom is simply too complex. Examples of this complexity follow:
- Some date timezones are optional and others are required.
- Some date timezones must be UTC and others can be any timezone.
- Relative URL w/ xml:base.
- The content constructs type and mode attributes.
- The default for type and mode are text/plain and xml, which makes for confusion when both defaults are selected.
- How do you interpret an entity-encoded (or CDATA) string with a type of text/html and a mode of xml?
- How do you interpret an entity-encoded (or CDATA) string with a type of text/plain and a mode of xml?
- There are simply too many optional elements to choose from.
Update: Here's another example of why Atom is complex. This fragment is from Kevin Mark's feed, blogger extraordinaire at Technorati.
<
info mode="xml" type="text/html"><div xmlns="http://www.w3.org/1999/xhtml">...</div>
</info>
I added a new test case to my validator to flag this.
--philippe
Randy
Randy