Mihai Parparita: Here are the top XML errors that we have encountered when parsing all of the feeds that our users have added to Reader.
% of errors | Error description |
---|---|
15.6% | Input claims to be UTF-8 but contains invalid characters. |
14.9% | Opening and ending tags mismatch |
13.9% | An undefined entity is used (e.g. in an XML document without importing the HTML set) |
7.8% | Documented expected to begin with a start tag, but no < was found |
5.7% | Disallowed control characters present |
5.5% | Extra content at the end of the document |
4.2% | Unterminated entity reference (missing semi-colon) |
4.2% | Unquoted attribute value |
3.8% | Premature end of data in tag (truncated feed) |
3.3% | Naked ampersand (should be represented as & ) |
2.1% | XML declaration allowed only at the start of the document |
1.8% | Namespace prefix is used but not defined |
0.75% | Comment not terminated |
0.64% | Attribute without value |
http://googlereader.blogspot.com/2005/12/xml-errors-in-feeds.html
Randy: Some interesting data would be the percentage chance that a feed has ill-formed XML based on the generator (Blogger, Wordpress, Typepad, MT, etc). Anybody got that data?
Randy