The RSS Blog

News and commentary from the cross-platform RSS and OPML community.

Today, I encountered this really weird problem trying to read a particular RSS feed. In fact, I've stumbled across a few RSS feeds with this problem. They seem to be related, but I've yet to figure out the exact problem. In fact, if I copy the RSS files to my server, the problem doesn't replicate. The issue relates to RSS version 0.91 feeds with a blank line located either directly above or below the DOCTYPE declaration. But, since I can't replicate the problem with the same file on my server, there must be a secondary sympton relating to the HTTP headers (or so I speculate). Here's the .NET code that fails, using Ken MacLeod's RSS feed.

System.Xml.XmlDocument doc = new System.Xml.XmlDocument();
doc.Load("http://bitsko.slc.ut.us/blog/index.rss");

The issue does not affect many .NET-based RSS readers at all, but it fails predictably. So, I assume that most .NET-based RSS readers don't use this code construct. By the way, the test does not fail using a local copy of the same RSS file. The problem is not new, as I've discovered an old Microsoft newsgroup posting with the same error message.

Update: In all cases, the problem was unrelated to anything I've mentioned above. It was all red herrings that seemed relevant until I found the root cause. Both files were, in fact, invalid XML. They had extra invalid characters in the wrong places. In Ken's case, he has invalid characters before the XML declaration. In the second case, he had a couple extra invalid non-space characters between the XML declaration and the DOCTYPE declaration. Case closed.

Update: Case unclosed. The author of the second feed has informed me that the couple extra invalid characters were accidently added when he tried to delete his DOCTYPE. I had advised him such. He deleted the extra characters and the DOCTYPE and his feed is fine now.

Update: Case re-opened. A reply from Ken indicated that he thought his feed was correct. So, back to reading bytes of an HTTP response. I then, caught on to the fact that Ken's server was using chunked transfer coding, which explains the extra bytes I was seeing, the chunk length. The question remains "Why doesn't Ken's feed work with .NET XmlDocument.Load method?"

Update: In case anyone is wondering, I've tested other feeds with chunked encoding and they don't all fail. I've encountered only three other feeds that fail in the same manner, but the list grows.

Reader Comments Subscribe
Sat, 20 Nov 2004 00:11:49 GMT

Using 'wget', I'm not seeing any characters before the XML declaration and no unusual characters between the XML declaration and the doctype declaration. There is a comment between the two, but that's allowed. Can you capture the raw HTTP request?

-- Ken "with an a in MacLeod" MacLeod

Sat, 20 Nov 2004 14:31:36 GMT

Ken, It's clear now your XML is perfectly fine. The problem seems to be an obscure bug in .NET. I've transferred your file, bytes in tact, to my own server and the exception doesn't happen. I've also reproduced the problem with several other feeds, but can't find any commonality in them.

Randy

Sat, 20 Nov 2004 15:40:10 GMT

If the server makes a difference, I do happen to be using Apache 2.

Sat, 20 Nov 2004 18:20:42 GMT

I've checked the originating servers and it's a mix of Apache and IIS. I haven't found any commonality other than DOCTYPE. I'll compile my results and send them to Microsoft.

Randy

Sat, 10 Jul 2010 11:49:49 GMT

I m trying to Use RSS FEED and getting this "Expected DTD markup was not found" result. Plz help me out.

Thanks Sir

Type "339":