Really Simple Syndication
|
Copyright 2003-4 World Readable
|
I just setup a chicklet in the right side-bar to subscribe to the RSS Blog via LiveMessage Alerts, which allows you to receive the blog entries via MSN MSGer. You can also subscribe by clicking here. Try it out!
Want this service for your blog? Get it here. It's free.
Mud's Test: OK, so what exactly is the problem with the feed, and why doesn't it validate?
Randy: To clarify, the feed info element is reporting itself as text/html, but is actually xhtml.
Nick Bradbury, author of FeedDemon: Nice summary of RSS bandwidth-saving techniques by Randy Charles Morin.
Randy: Thanks!
I tried using RSS Digest to add links from my iBLOGthere4iM blog to the BoingBoing blog. The result was ten javascript:document.writeln("");. This was working a few months ago, but has since stopped working. I assume the RSS Digest service is not being maintained.
Update: Reviewed the problem w/ Peter Cooper of RSS Digest and we identified the problem as lack of escaping of certain characters in the javascript src attribute.
Chris Nolan: I enchanced the blog toolkit on the weekend too, and yesterday I got an email from the KBCafe.com's RSS validator letting me know that my feed was broken (I had made a typo and forgot to re-validate). I felt really special that this guy Randy took the time to email me to let me know, and then later in the day I realized it's probably just a completely automated system he has setup that crawls feeds, validates them and emails if they fail. Turns out his system is a bit more strict than FeedValidator's too in his handling of whitespace. I've modified my template to accomodate. He's got a # of blogs on his site as well, and he manages the local toronto bloggers meetup group so maybe I'll check them out in the new year.
Randy: Just a note. There's no automated system that emails you when the Really Simple Validator finds your feed to be broken. I write the emails entirely myself.
Mud's Tests: One validator says this feed is good, while another says there are problems.
Randy: I added some comments to the original post. But, what the author says is completely correct, except that, the validators do in fact respond alike and that the difference is caused by a typo and bad usability design (which I gotta work on too). The biggest problem is that both validators, do, in fact, report the feed as valid, even though it's really invalid.
I wrote three bookmarklets for my Really Simple Validator. If a Webpage supports auto-discovery, then the bookmarklet should pick up the relevant RSS, Atom or OPML feeds and pass them to the Really Simple Validator.
You can usually just drag these to your links bar, but you can also right-click on the them and Add to Favorites... Here's a VBS-based installer.
<a href="http://www.newsgator.com/ngs/subscriber/subext.aspx?url=RSS_URL_HERE">
<img src="http://www.newsgator.com/images/ngsub1.gif" alt="Subscribe in NewsGator Online" border="0"></a>
Copy the above HTML to your site, and replace "RSS_URL_HERE" with your feed URL (Atom or RSS). When users click on the image, they will automatically add a subscription to your feed.
Randy: Shall do tonight.
Update: Done!
Rogers Cadenhead: OPML 1.1 was announced here.
Randy: I always wondered what OPML 1.1 was vs. OPML 1.0
According to my Webstats and user-agent reporting...
A high score could mean either popularity or a bad RSS polling algorithm.
I inadvertantly deleted the code to handle Unicode BOM from the Really Simple Validator. Yikes! Still laking in test cases and the test case database is already over 1000.
Update: Fixed!
Jeff: We have come to the decision to shut ILoveRSS fourm community down because of lack of member participation.
Randy: Thanks for your great experiment.
Yahoo!: Your message to the rss-user group was not approved. The owner of the group controls the content posted to it and has the right to approve or reject messages accordingly. In this case, your message was automatically rejected because the moderator didn't approve it within 14 days.
Randy: Dave Winer once asked me what I would do if I were on the RSS advisory board. I replied, "that the most important thing is to get user questions answered." Unfortunately, getting the user questions answered is not easy.
Chris Schmidt: There are a number of ways to make your FOAF file more commonly known on the web, and you can perform any number of them, depending on how interested you are in having FOAF start working for you.
Source: Le plan B de .Conforme.
rsv User: Thanks so much for your help so far! I'm going to yank that Doctype line completely and see how it looks... ..... Woohoo! My feed now shows as validating! I also tossed in that extra little snippet that Firefox needs to show the feed as a Live Bookmark. My users are going to love this. And I'm going to keep an eye on your excellent RSS site.
rsv User: Anyway, THANK YOU! I've been going nuts comparing my OPML to others and couldn't find the difference (look at every character I guess!). Thanks again.
Randy: The satisfaction from helping people is often realized in a simple email or two.
Every once in awhile, I like to repeat this message. Where's the RSS Advisory Board. The current members are MIA, save Adam Curry. If Harvard and two thirds of the RSS Advisory Board are no longer willing to move RSS, then let's start talking about where RSS should live? Can anybody confirm that Steve Zellers is a real person? I can't. Let's get the real people behind the RSS movement to lead us forward; Dare Obasanjo, Don Park, Robert Scoble and Scott Johnson.
Question: How come Atom and RSS are not registered media types? What do we have TODO to change this?
Found a previous attempt to register by mnot in the Google cache. At one point, this was also an IETF Internet draft, now 404. And here's a non-404 but expired copy of mnot's work.
mnot: application/rss+xml isn't registered, because the IESG wanted a "stable reference" for the spec (it being in the standards tree). So, it's technically incorrect to use it now; this is one of the reasons this is still a confusing issue.
Randy: Politics.
Today, I encountered this really weird problem trying to read a particular RSS feed. In fact, I've stubbled across a few RSS feeds w/ this problem. They seem to be related, but I've yet to figure out the exact problem. In fact, if I copy the RSS files to my server, the problem doesn't replicate. The issue relates to RSS version 0.91 feeds w/ a blank line located either directly above or below the DOCTYPE declaration. But, since I can't replicate the problem w/ the same file on my server, there must be a secondary sympton relating to the HTTP headers (or so I speculate). Here's the .NET code that fails, using Ken MacLeod's RSS feed.
System.Xml.XmlDocument doc =
new System.Xml.XmlDocument();The issue does not affect many .NET-based RSS readers at all, but it fails predictably. So, I assume that most .NET-based RSS readers don't use this code construct. By the way, the test does not fail using a local copy of the same RSS file. The problem is not new, as I've discovered an old M$FT newsgroup posting w/ the same error message.
Update: In all cases, the problem was unrelated to anything I've mentionned above. It was all red herrings that seemed relevant until I found the root cause. Both files were, in fact, invalid XML. They had extra invalid characters in the wrong places. In Ken's case, he has invalid characters before the XML declaration. In the second case, he had a couple extra invalid non-space characters between the XML declaration and the DOCTYPE declaration. Case closed.
Update: Case unclosed. The author of the second feed has informed me that the couple extra invalid characters were accidently added when he tried to delete his DOCTYPE. I had advised him such. He deleted the extra characters and the DOCTYPE and his feed is fine now.
Update: Case re-opened. A reply from Ken indicated that he thought his feed was correct. So, back to reading bytes of an HTTP response. I then, caught on to the fact that Ken's server was using chunked transfer coding, which explains the extra bytes I was seeing, the chunk length. The question remains "Why doesn't Ken's feed work w/ .NET XmlDocument.Load method?"
Update: In case anyone is wondering, I've tested other feeds w/ chunked encoding and they don't all fail. I've encountered only three other feeds that fail in the same manner, but the list grows.
I exported by Blogroll from Sharpreader yesterday. I found two problems w/ the generated OPML. The first, I already knew and fixed before the upload. That is, the title of the blog is encoded in the title attribute of the outline, rather than the text attribute of the outline. The spec is clear, it should be text, not title. The second problem, I figured out only after I ran the Really Simple Validator on my blogroll.
By the way, I donated $10 to Sharpreader today. Thanks Luke, for authoring my RSS reader.
I added FOAF validation to the Really Simple Validator. It's generating a lot of unexpected results on existing FOAF files. I suspect this is caused by four factors.
Danny found an issue w/ the validator. Reading the Dublin Core XML Schema, I incorrect read (actually, I read correctly) that dublin core elements could not have xml content or attributes with the exception of xml:lang. I've modified the Schematron to account for another attribute rdf:resource. I've also allowed xml content after re-reading the RDF primer.
I haven't deployed this fix yet, but soon. I'm looking for more exact feedback, so here's what I described in the Schematron for Dublin Core.
<assert test="not(@*[name()!='xml:lang' and name()!='rdf:resource'])">By the way, this is another experience w/ RDF that scares me away from using it. Too complex!
Update: Danny has posted on this subject. Which leads me to note, I should read the RDF Syntax Grammar and convert it into Schematron.
RSS Specifications: We have added graphics to the RSS graphics collection, that webmasters can use to denote OPML graphics. OPML files contain collections of RSS feeds, adding the graphic to websites will let visitors know that multiple feeds are available.
Randy: I've always wondered, who invented the orange XML chicklet?
I've reproduced most of the problems that were encountered by the flustered Blogger user. The first and most trivial problem is that the regular Atom produced by Blogger is invalid. The problem relates to the <info> element, which contains xhtml even though the content type is text/html.
<
info mode="xml" type="text/html"><div xmlns="http://www.w3.org/1999/xhtml">This is an Atom formatted XML site feed. It is intended to be viewed in a Newsreader or syndicated to another site. Please visit the <a href="http://help.blogger.com/bin/answer.py?answer=697">Blogger Help</a> for more info.</div></info>This is not a biggy, but don't worry, things get a lot worse. The next problem happened when the user tried to enhance his feed w/ FeedBurner. Feedburner changed his Atom timezones and since the feed:modified must have the UTC timezone, his Atom feed became invalid for a new reason.
<modified>2004-11-12T20:50:42-06:00</modified>Don't worry, things are going to get worse. By the way, the CEO of FeedBurner is Dick Costolo, who doesn't know it, but I helped him become a rich person. I had a big say in purchasing his previous company; SpyOnIt. I was in charge of alerts at 724 when we purchased SpyOnIt. SpyOnIt was a great company (I was their biggest user) and I think FeedBurner is looking almost as good.
At one point, FeedBurner was stripping the author, modified, issued and id element from the entry element. This one I was never able to reproduce, but I know he had FeedBurner's SmartFeed turned on, because it was responding in RSS 2.0 to my validator and Atom to the FeedValidator.org.
Finally, after a few red herrings, the frustrated blogger removed a couple special characters from his Blogger post and everything just sort of clicked. He's a little happier and has considered starting a help the RSS end-user type of blog/FAQ.
The real victims of the Syndication Wars is the end-user. As we fight for RSS and Atom and all that API crap, the end user finds himself struggling w/ incompatible technologies. Earlier today, I spent a good part of an hour trying to help a user who couldn't subscribe to his own Blogger feed using My.Yahoo! Here's this user's rant from our Yahoo! MSGer conversation, which he agreed I could distribute w/ personal references removed.
My answer to him?
I input all the OPML files at the OPML Directory input my little utility and ran the numbers on the use of each feed type. There were too many hanging chaps (bad feeds) to compensate for, so after a day of rejigging the logic to compensate for bad XML, the final results ended up as follows...
These are the number of feeds of each RSS version.
I have to say that Tim Bray has done an absolutely fantastic job of cleaning up Atom. A look at the latest Atom format spec shows a much such simpler syntax than the deployed version (version 0.3). Some great changes are...
Now, I don't care too much about the format, but if he can do that same to the Atom protocol spec, then we have a winner. I might create XSD, Relax NG and Schematron for this new format spec.
This week, I added RSS Link Module support to the Really Simple Validator. I don't yet have any test cases, so feedback/errata would be great. I'm going to try and scrape a bunch of RSS feeds that use the module to test from. Time permitting.
Yahoo: When you update your site, you can also ensure My Yahoo! gets updated by using our API. Our system will schedule an immediate refresh of your site so that My Yahoo! has the most up-to-date version of the RSS feed.
Randy: Did you know Yahoo! has an RSS ping service?
Don Park: Even if victory is declared now for the Atom feed format, it will just start a Brand War since feature differences are minor, leaving only the brand as the primary differentiator. As I pointed out seemingly ages ago, the area Atom could have made the biggest impact is in the protocol/API. But as you can see by the level of activities in the Atom protocol mailing list, work on the protocol haven't gained much ground, let alone converge. I don't think it's too late to use RSS 2.0 as the starting point and build on it without breaking backward compatibility. Most of the items in Tim's list of advantages can be added to RSS 2.0 as extensions, either individually or as a set called Atom. This will allow the Atom WG to focus on the protocol instead of getting angry.
Randy: Atom, we don't need YASF (Yet Another Syndication Format). Give us an API that works!
I just wrote a neat little utility for running HTTP GET request. I use it to look at RSS request HTTP headers when RSS users need a little help. The core of the utility is what happens when you click the GO button. Here's open source for you.
System.Uri uri = new Uri(this.textBoxAddress.Text);
System.Net.Sockets.TcpClient client = new System.Net.Sockets.TcpClient();
client.Connect(uri.Host, uri.Port);
System.Net.Sockets.NetworkStream stream = client.GetStream();
string req = "GET " + uri.PathAndQuery + " HTTP/1.1\r\nHost: " + uri.Host + "\r\n\r\n\r\n";
this.textBox.Text = req;
byte[] bytes = System.Text.Encoding.UTF8.GetBytes(req);
stream.Write(bytes, 0, bytes.Length);
stream.Flush();
for(int i=0;i<100;i++)
{
if (stream.DataAvailable)
{
break;
}
System.Threading.Thread.Sleep(100);
}
bytes = new byte[1024];
System.Text.StringBuilder sb = new System.Text.StringBuilder();
sb.Append(req);
while (stream.DataAvailable)
{
int count = stream.Read(bytes, 0, 1024);
if (count == 0)
{
break;
}
sb.Append(System.Text.Encoding.UTF8.GetString(bytes, 0, count));
}
this.textBox.Text = sb.ToString();
Ya, it's just a regular old HTTP request, but done at the TCP level so I can dump all the bytes. Don't use it to view the RSS, I use it to get the HTTP headers and stop there. You can download the zipped utility right here. Enjoy!
Reader: First off, great site. I've finally found some info on the problem I keep hearing about from my readers. And your validator actually shows me an error, instead of incorrectly validating. If I can trouble you for a minute of your time, can you check out my feed? I don't know how to fix it. It works fine in FeedReader, but is blowing up elsewhere. Any tips are a big help.
Randy: Gotta love positive feedback like this. The problem was a space in the HTTP header-name. Another problem that isn't addressed by the other feed validators. Similar to the Wordpress problem.
WritTorrent Project: First off, let me just say, that this is hack. I don't even program in Java... this is my first. It generates a basic RSS 2.0 feed with support for categories, and hosting of torrens all from within Azureus itself. Okay!
Randy: Second off, I don't really know what this does, but it sounds cool and somebody sent the link to me. That's worth a free blog.
Today, iM taking RssReader for a test drive. I've never used this one before and first impression is that it's one of the best. I really like the reading mode. I exported/imported my blogroll from Sharpreader. RssReader's import is rather painful, needs a little usability work. Reading in RssReader is painless, I can view all Unread items from the last 24 hours for all feeds in one HTML window. Finally!
Update: RssReader is good, but not great. I still like Sharpreader.
Over the last bit, we've seen Scott Johnson announce that Feedster is given away iPods to the best community developed Feedster solutions and we've seen the rise of the FeedsterHacks blog. Well, building on top of my already existing Juice bookmarklet library, here's my Feedster bookmarklet library.
Tested in IE6 and Moz. VBS based installer for IE.
Tim Bray: The world can use Atom, sooner rather than later. The return-on-investment of further WG time invested in polishing something that's already pretty good is starting to be very unattractive.
Dare Obansanjo: So far Atom is a less featureful version of RSS 2.0.As it stands now given the current cost/benefit of Atom over RSS 2.0, the current Atom draft + categories isn't something I'd advise anyone at my employer's to use if it became an RFC nor is it something I believe we'd use on the stuff I work on directly (http://spaces.msn.com for one).
Randy: Both Dare and Tim are correct. The Atom format on itself is of little use over RSS 2.0, since it's effectively a 1-to-1 rewrite of RSS 1.0. On the other hand, Atom is really getting stale. W/out action, it's dead!
Nov 7: We're once again having trouble with a database server today and have had to restart it multiple times.
Nov 5: We had another significant problem with a database server last night that would have resulted in a large number of errors and problems with accessing posts.
Nov 3: One of the database servers was having performance trouble for several hours this morning starting at about 5am PST.
Nov 1: Mail-to-Blogger is currently having some problems
Randy: Posting to Blogger has become next to impossible. The users are starting to complain.
Question: Anything happen lately at Blogger? Any key employees leave?
It's been awhile since I tried FeedReader, but I thought I'd give it another go today. The install went well, but than came the annoying endless notification popups that consumed half of my display. This was followed w/ the UI constantly activating itself over top of my working window. A positive, FeedReader produced much better OPML than any of my usual readers; Sauce Reader, Sharpreader and RSS Bandit. FeedReader correctly uses the text attribute as the title of the outline, rather than the unspecified title attribute.
Technorati, against the wishes of Dave Winer, released a new Technorati at BloggerCon III. I've noticed that after more than one half year of Technorati nothingness, the engine started working again today. I got a response in less than one minutes, in fact, less than one second. Congrats to David Sifry and Kevin Marks!
Update: It's a little on and off right now, but much better than just a couple days ago.
Update: After a brief return to the functioning state, Technorati has returned to the borked steady state.
I was listening to the BloggerConIII streaming audio today. That got me back onto IT Conversations, which I haven't used in many months. Now I'm listening to a bunch of other crap. The have this feature called Your Queue. You can queue up and listen to streams. Here's what's in my queue.
They have podcast RSS feeds. Hmmm! An excellent source of high quality podcasts.
Danny Ayers: Suzan Foster has a schema (announcement) for media âenclosuresâ with RSS 1.0 feeds.
<rss:item>
...
<enc:enclosure>
<enc:Enclosure>
<enc:type>foo/bar</enc:type>
<enc:length>65536</enc:length>
<enc:url>http://foo.bar/baz</enc:url>
</enc:Enclosure>
</enc:enclosure>
</rss:item>
Randy: Great idea! Note, I don't understand the uppercase E enc:Enclosure inside the lowercase e enc:enclosure.
Recently, I wrote the schematron schema to validate Atom version 0.3. I wrote the rules to conform as closely as I could to the actual Atom 0.3 specification. I finished the schematron earlier this week and immediate set out to find how valid the existing Atom feeds are.
Blog | Really Simple Validation | FeedValidator |
---|---|---|
Intertwingly (Sam Ruby) | ||
evhead (Evan Williams) | ||
Six Apart | ||
The Real Geek on Blogger (me) | ||
.Conform (Philippe Janvier) | ||
.Conform Blogmarks | ||
Salad w/ Steve (Steve Jenson) | ||
Atom Enabled |
That should be enough to prove my point. About half of the Atom feeds are invalid. I should also point out that the FeedValidator was incorrect more often than not, pointing out validation issues that didn't exist in the spec and missing other validation issues present in the spec. Why is it so difficult to create a valid Atom feed? The problem is that Atom is simply too complex. Examples of this complexity follow:
Update: Here's another example of why Atom is complex. This fragment is from Kevin Mark's feed, blogger extraordinaire at Technorati.
<
info mode="xml" type="text/html">I added a new test case to my validator to flag this.
This last year, I've been switching back and forward between three RSS readers; RSS Bandit, Sharpreader and Sauce Reader. During that year, I've noticed that Wordpress feeds rarely, if ever, work in these .NET-based RSS readers. For the longest time, I simply ignored the problem and even dropped the said feeds from my blogroll as there was no value in a broken feed anyway. But, as you would have read in my RSS blog, I've been working on a general XML format validator. The first format I addressed was RSS and this got me thinking that I'd test those Wordpress RSS feeds on my validator, which by the way, is also partially based on .NET. My validator reported the feed to be broken, at the HTTP level. Of course, the FeedValidator was reporting the feeds to be valid! Well, I couldn't take that conflict and decided to find out what was up.
The first thing I did was write a small test console application that tried to load the Wordpress feeds into System.Xml.XmlDocument.
System.Xml.XmlDocument doc = new System.Xml.XmlDocument();
doc.Load(uri.ToString());
This returns an System.Net.WebException with the message "The underlying connection was closed: The server committed an HTTP protocol violation." Google this and you start to see where the problem is. It turns out that Wordpress is broken. I wanted to see this for myself, so I wrote another console application to query the TCP/IP way.
System.Net.Sockets.TcpClient client = new System.Net.Sockets.TcpClient();
client.Connect(uri.Host, 80);
System.Net.Sockets.NetworkStream stream = client.GetStream();
string command = "GET " + uri.PathAndQuery + " HTTP/1.1\nHost: " + uri.Host + "\n\n\n";
byte[] by = System.Text.Encoding.UTF8.GetBytes(command);
stream.Write(by, 0, by.Length);
System.Threading.Thread.Sleep(1000);
by = new byte[1024];
System.Text.StringBuilder message = new System.Text.StringBuilder();
do
{
int n = stream.Read(by, 0, by.Length);
message.Append(System.Text.Encoding.UTF8.GetString(by, 0, n));
} while (stream.DataAvailable);
And the returned HTTP header is?
HTTP/1.1 200 OK
Date: Tue, 02 Nov 2004 17:06:57 GMT
Server: Apache/2.0.40 (Red Hat Linux)
Accept-Ranges: bytes X-Powered-By: PHP/4.2.2
Last Modified: Tue, 02 Nov 2004 11:45:13 GMT
ETag: "3120b3f942d975a454c923b36a05e837"
X-Pingback: xxx
Connection: close
Transfer-Encoding: chunked
Content-Type: application/rss+xml
Note the Last Modified header. This header-name has a space in it, which is illegal. You can verify this in the HTTP spec. In section 4.2, the header-name is said to be a token and in section 2.2, the token is said to not contain spaces. The Last Modified header is usually written Last-Modified, w/ the hyphen. To fix this problem in Wordpress, search for the following line in your PHP and add the hyphen
@header('Last-Modified: '.$wp_last_modified);.
Now, I need only email this page to my friends and I can enjoy their blogs again.
Update: As usual, Dare has a fix for RSS Bandit already.
Dare Obasanjo: There's a lot of innovation and interesting end user applications that can be built on RSS today. However many XML syndication geeks are prideful and would rather reinvent the wheel than use existing technology to solve real world problems.
Randy: Sometimes Dare throws in a comment or two that hit all over. Well done! I have pretty much ignored the Atom project these last few months. I've also noticed a large tail off in activity on the Atom mailing list. There use to be 50 to a 100 posts per day on the list, now there's sometimes less than a dozen. Also, I haven't heard much from the RSS doesn't scale gang.
David Sifry: Thanks to some initiative and hard work from Kevin Marks, we've put up a page that tracks Vote Links. If you want to show your approval of John Kerry and disapproval of George Bush, you can do it the following way:
<a rel=âvote-forâ href=âhttp://www.johnkerry.comâ>John Kerry</a>
<a rel=âvote-againstâ href=âhttp://www.georgewbush.comâ>George Bush</a>
Randy: Or, if you want to vote for Oleg Dulin and Dave 'Freeke' Walker.
Spotter: Danny Ayers.
W3C: This note describes a project for describing & retrieving (digitized) photos with (RDF) metadata. It describes the RDF schemas, a data-entry program for quickly entering metadata for large numbers of photos, a way to serve the photos and the metadata over HTTP, and some suggestions for search methods to retrieve photos based on their descriptions.
Randy: Bookmarked for later reading.
Robert Scoble: Dave, NewsGator supports enclosures too. (Dave's talking about news aggregators and has been asking lately why more of them don't support RSS 2.0's enclosures). Does your aggregator support enclosures? If so, you can use it to receive podcasts, among other things (Channel 9's RSS feed for videos includes enclosures so you can subscribe to our videos).
Randy: Quite a horror story coming. Following Robert's blog entry, I decided to check if Sauce Reader supported enclosures. So, I opened it up and cut the link to Sauce Reader, hit the subscribe button and....
Could not create microcontent from URL?
My immediate thought was that Sauce Reader didn't work w/ enclosures, until a second glance revealed that the subscription item's title was http://www.bloglines.com/{etc}. Is Sauce Reader using bloglines for subscriptions? I went to another blog that doesn't have enclosures and tried to subscribe. Same error. I tried several other blogs. Same error. Sauce Reader is broken? I can no longer subscribe to any new feeds. Uninstall.
Back to Sharpreader! FYI, doesn't seem to be any enclosure support in Sharpreader. The export/import between Sauce Reader and Sharpreader worked.