Really Simple Syndication
Copyright 2003-4 Randy Charles Morin
All the errors are caused by trying to use RSS 2.0 elements in RSS 1.0. I wish the FeedValidator would emit these as errors, if not warnings.
Update: I just noticed that link is really just a Yahoo! news link. Hmmm! Are Yahoo! news feeds that bad too? Fortunately, they are much better. The Top Stories feed looks pretty awesome. In fact, I can't find anything wrong w/ it.
Marc Hedlund: Bloglines today announced a set of new web services APIs, allowing developers to write applications for reading RSS and Atom feeds by drawing data directly from the Bloglines databases. [cut] By drawing feeds from the Bloglines database, developers are presented with a single format--Bloglines normalizes all of the feeds it collects before distributing feed content.
Randy: By normalizing the feed, the API becomes simple to implement, but will lack the flexibility and power required by a high end aggregator. This will be great for beginners wanting to write an RSS reader w/out worry about bandwidth hogging.
I really hope nobody uses this API until Bloglines re-thinks this implementation. Although it may seem to reduce bandwidth between the RSS reader and the RSS publisher, the side effect make this a very undesirable implementation.
Last, I picked up this article by Marc Hedlund, primarily because the article links to an entry on my blog.
Jeremy Zawodny: Well, we just launched a beta of the next generation of My Yahoo that fits into that reality. Instead of "you can add anything you want, as long as it's on the list of My Yahoo content" you can now add pretty much any public RSS or Atom feed. In other words, the content model is open.
Randy: This is like twice as good as Firefox's RSS implementation. In other words, it doesn't quite suck, but it's very close.
Nathan Wallace: Initially I was angry at how cleverly Dare Obasanjo (an RSS Bandit developer) had twisted a source quote and implied that Sauce Reader's comments RSS support was causing 10,000+ hits on servers and generally behaving as a bad net citizen. I then became amused when I discovered that a "peacemaker" had picked up on this and immediately elevated Sauce Reader to evil status. The blogosphere at work...
Randy: Nathan doesn't like that I bundled his software into the Evil Software bin.
Nathan: By default, Sauce Reader will automatically check comment RSS feeds for new comments for the first 7 days after the weblog post is published.
Randy: I like the seven day throttling, that's a great feature, but what happens to comments left more than 7 days after the Weblog post is published? I would suggest reducing the throttling based on the pubDate or lastBuildDate of the comment feed.
For example, Juice, my personal browser, has built-in RSS support. If a feed hasn't been modified for a week, then I reduce polling by six-fold. A month is twelve-fold. Three months, twenty-four fold.
But that's not where Nathan is erring. Where he is erring is on
polling feeds hourly that were not specifically subscribed to by the end-user. In Juice, any feed that is not specifically subscribed to, is throttled back six-fold. In other words, I'll poll it every six hours for the first week, then every day and a half and so-on, till I'm polling the feed as little as once or twice per week. And, I provide a more specific mechanism to subscribe to a comments feed. This gives the option back to the end-user to poll the hell-out-of Dare's server :)
Juice is not evil.
Update: Just notice the following on a followup read.
Nathan: Sauce Reader is among the market leaders in terms of general client behaviour: 1. The default feed refresh frequency is 4 hours.
Randy: This information means that Sauce Readers polls 1/4 as often as other readers. I wonder why? Also, this invalidated my "pollijng feeds hourly that were not specifically subscribed to by the end-user" comment. Must apologize for that. Sorry Nathan.
Another blog advertising exchange. This one is based on reader influence and reputation. Effectively, you inject your message by paying bloggers to write honestly about your product. Here's their thought...
That's enough experimenting w/ Bidvertiser.com. Time for a new advertising service. If the service ever gets worthwhile, then please do ping me as I really like the Bidvertiser.com model. Too bad their Website is broken.
Robert Sayre: If the service has discovery, the application can throw cached data away, safe in the knowledge that it can retrieve things later if needed. This is critical to mobile applications.
Randy: Stateless is beautiful.
Weblogs Inc.: Randy Charles Morin posts a lot a great material on RSS at his blog of the same name. Lately heâs been taking some second looks at feed readers (Sharpreader and Pluck), as well as leading the charge against those who would claim that RSS does not scale. In order to clear the air on that issue, heâs started a document called HowTo RSS Feed State (not the catchiest title, but stillâ¦), worth a look for anyone who a.) creates Yet Another Aggregator, or b.) needs to serve RSS files to a voracious public.
NewzCrawler: According to the answer on the NewzCrawler support forums when NewzCrawler updates the channel supporting wfw:commentRss it first updates the main feed and then it updates comment feeds. Repeatedly downloading the RSS feed for the comments to each entry in my blog when the user hasn't requested them is unnecessary and quite frankly wasteful.
Dare Obasanjo: Checking my server logs I found out that another aggregator, Sauce Reader, has joined Newzcrawler in its extremely rude bandwidth hogging behavior. This is compounded by the fact that the weblog software I use, dasBlog, does not support HTTP Conditional GET for comments feeds so I'm serving dozens of XML files to each user of Newzcrawler and SauceReader subscribed to my RSS feed every hour.
Randy: Evil software is everywhere. I see now why Dare's blog has been slow of late.
Omri Gazitt: One interesting existing application of this pattern is Atom (a publishing/blogging protocol built on top of SOAP). Looking at the Atom WSDL, it looks very much like WS-Transfer - a GET, PUT, DELETE, and POST (which is the CREATE verb specific to this application). So Atom could easily be built on top of WS-Transfer.
Source: Dare Obasanjo.
Randy: Interesting thought. I'll have to dive into WS-Transfer.
An early draft of my RSS feed state document. I'm hoping to create a one stop shop for writing RSS reader and publishing code, so that we can avoid the repeated RSS doesn't scale non-sense. Looking for feedback.
Excerpt: Of late, the blogosphere has been alive w/ claims that RSS doesn't scale. This started when Chad Dickerson, CTO of Inforworld, wrote an article called "RSS Growing Pains," where he explained that RSS traffic at Infoworld was out of control. Dare Obasanjo, jumped to the rescue of RSS and showed how Inforworld was incorrectly using RSS. Then, less than two months later, Robert Scoble wrote his famous piece claiming "RSS is broken." Later Sara Williams admitted that Scoble's claims were exaggerated.
The problem w/ RSS is it's simplicity. Developers can quickly write RSS feeds and publish content in record time. And this is great. This is why RSS is the future of the Web. The problems occur when software developers write bad code to publish and pull RSS feeds from the Web. It would be great if the RSS advisory board had a FAQ section that told us how to properly publish and pull RSS feeds, but this hasn't happened. As such, I've set out to do exactly this. Show the busy developer how to properly publish and pull RSS content from the Web.
Don't think that I'm inventing a new wheel here. Everything in this document is already used by some RSS software. Everything has already been specified int he various RSS specifications, notes and HTTP specifications and extensions. Further, this is a living document that will describe the most widely used mechanisms for providing state to RSS feeds. I want the reader to think about this document and respond w/ his thoughts, especially where my thoughts are not the same as the mainstream thinking. Please send feedback and ERRATA to me.
I converted from Google's Adsense (on iBLOGthere4iM), which was paying me $0.26 CPM to Bidvertise.com today. In part because I'm a little upset at the low paying Adsense and in part because I want to experiment and check out new stuff. I'm getting $0.01 per click right now on Bidvertiser, but I assume that increases as more advertisers become aware of your site.
Update: Early results are showing a CPM below $0.01 :(
Tim Bray: Herewith a brief report from the opposition benches in the WS-Parliament. [cut] How many pages? [cut] 612.
Michael Hanson: there's another 569 pages for you.
Randy: WS-Pagecount = 1181 and counting.
Rob Sayre: Where there's WS-Smoke, there's WS-Fire.
Iron Maiden: Run to the hills run for your lives.
This morning, at home, I installed Pluck, another RSS reader. I've previously checked it out and was disappointed and I remaing disappointed. Very difficult to use. Very difficult getting started. Not nearly as powerful or as cool as Sharpreader. It does provide a search facility on your retrieved blogs, but the results are difficult to read. It's an IE toolbar, which is a small positive, big negative. Sure you can auto-subscribe as you surf the Web, but you now have another Explorer bar and Explorer toolbar occupying your surfing space.
Scott Johnson: MP3 Enclosure Feeds (and more) Come to Feedster. Andrew Grumet is a good friend and he and I were talking the other day and he happened to mumble something about "ipodder.org". And, poof, it was like magic and I was hooked.
Frank: PhotoRSS is a multi-faceted approach for syndicating photos, and making them more searchable. PhotoRSS builds on formats such as HTML, XML and RSS, bringing to photos what has already been done for weblogs and text.
Randy: Something to watch.
I installed Sharpreader just yesterday. I've used it off and on over the last two years, but it had been awhile since I last used it. I always thought it the best RSS reader and thought I'd check out the new features (and find out if it still works w/ my blogging software). Well, surely enough, it did work w/ my blogs. So, I loaded up a few dozen feeds and at least 10% failed, for one reason or another. In some cases, it was a bug in Sharpreader, but more often than not, it was buggy RSS. I'm amazed how much RSS still doesn't validate. In fact, several bloggers who write frequently on the topic of RSS and contribute to Atom, had invalid RSS feeds. What a state we are in. I don't expect your RSS feed to validate 100% of the time, even mine doesn't do such, but if your RSS feed validates 0% of the time, then few are actually reading your blog w/ an RSS reader.
Let me get back to Sharpreader. I haven't seen much improvement of late, but I note it now supports Atom. It's still the best RSS reader that I've used.
Update: ScriptingNews doesn't work very well in Sharpreader. The feed looks ok, but I think Sharpreader is failing to use the <guid> element and is causing some of Dave's post to appear in duplicate.
Jim Rapoza: Many large sites that deliver RSS feeds recently started complaining that they are being hit every hour with a flood of reader requests that is, for all intents and purposes, the same thing as a denial-of-service attack. This happens because most RSS readers are pretty dumb about when they check for updates, and there's little the server can do to control this.
Source: George Dearing.
Dave Winer: On the Atom-Syntax list they're talking about versioning and extensibility, two problems that are very easily solved in XML. For versioning, define a required version attribute on the feed element, a string in the form x.y, where x and y are two numbers.
Randy: It's not that easy in Atom-land. You see, they have defined more than one root level element. They have allowed atom:entry to be a root level element in the API. The version must exist on all root level elements.
Dare Obasanjo: My article Improving XML Document Validation with Schematron is finally up on MSDN. It provides a brief introduction to Schematron, shows how to embed Schematron assertions in a W3C XML Schema document for improved validation capabilities and how to get the power of Schematron in the .NET Framework today.
Randy: Gotta love Schematron!
Source: Dan Gilmour.
Randy: I would define Moblogging as mobile blogging, that is, blogging from a mobile device. Mobile devices include PDAs, RIMs, J2MEs, Smartphones, Palms, cell phones w/ digital cameras and digital cameras and camcorders. Under Dave's definition, does writing a blog entry from an Internet Cafe qualify as moblogging? How about buying a magazine that you need for research on a blog entry you have on the back of your mind? I think Dave's definition incorrectly includes such blogging activities and I would exclude them from any definition of moblogging.
Rooneg: What does it do? In short, it allows you to only send new entries in your Atom feeds down to the clients. The client program adds a few HTTP headers (a If-Modified-Since to tell you what the last time they got was and an A-IM that indicates you support the 'feed' IM) and things just magically work.
Source: Sam Ruby.
Randy: This is RFC3229 support for Apache. Read Rooneg's original article and don't forget Sam's follow on. Further, Bob Wyman has been preaching 3229 on his blog. The only problem w/ 3229 is that it's not widely used by any HTTP clients and from what I know, it's not used by any RSS readers.
Paul Festa: As Web logs gain in popularity, critics warn that they are increasingly becoming the Internet's new bandwidth hog. The issue has been in the spotlight for much of this month, following a decision by Microsoft to abbreviate developer blogs both on its Web site and in syndication, citing a bandwidth crunch.
Randy: In the end, Sara Williams admitted that "RSS traffic is neglible compared to all the traffic generated." Kinda puts a damper on Paul's article. But had he listened, those who knew better were saying that Scoble was simply crying wolf.
Some eight months ago, Sam quoted me saying Atom is dragging out. Rss is not moving. Sigh. Does anybody believe we are in a better position eight months later?
Dave Winer: Just got off the phone with Steve Gillmor. He asked about yesterday's post about publisher-opt-out for RSS. I wrote up a proposal approx two years ago. Yesterday Adam had an interesting idea of how to do it. If a feed has a very large ttl value, then go ahead and unsub. I like that approach. Nothing new has to be invented or documented.
Randy: The above both seem like a bad idea. Many RSS readers, like my Juice browser are already implementing the 410 Gone solution. This solution for tearing down a subscription channel is not a new invention and is well documented. Nothing new has to be invented or documented. And it's already implemented by many RSS readers.
Don Park: I just finished looking at the code implementing Mozilla's RSS support (aka 'Live Bookmarks') and came up with these tips.
Randy: Don shows us how to implement Moz Firefox styled auto-discovery.
Sara Williams: In a nutshell, our RSS traffic is neglible compared to all the traffic generated by Windows Update, MSN, downloads, and the rest of microsoft.com. We were motivated to reduce the size of the blogs.msdn.com home page primarily for operational efficiency's sake - why serve up 400k of content when we know that folks (except for Robert) don't read 400K of content on a web page.
Randy: In a nutshell, there never was a problem.
Remember: I'm guessing they were just crying wolf.
Randy: I told you so.
Scott Watermasysk: We still have a little more work to do on the home page (more details later) and need to address the Http 1.0 requests and requests without file extensions which are currently not HttpCompressed.
Robert Scoble: Scott Watermasyk, the guy who makes the .TEXT blogging tool that 1300 bloggers are using on weblogs.asp.net, is reporting that full text feeds are back on Microsoft's blogger's main feed.
Randy: Can I assume that Steve fixed the RSS doesn't scale problem? :)
Dave Winer: A perennial problem with RSS is how does the publisher force an unsubscribe? Shouldn't the publisher have the right to opt-out? Right now there is no mechanism that's broadly supported by aggregators.
Randy: I disagree. HTTP has an built-in mechanism for forcing an unsubscribe and if your blogging client doesn't support it, then it's incomplete. The mechanism is to respond w/ an HTTP 410 status code, that is, resource is Gone.
RFC: The 410 response is primarily intended to assist the task of web maintenance by notifying the recipient that the resource is intentionally unavailable and that the server owners desire that remote links to that resource be removed. Such an event is common for limited-time, promotional services and for resources belonging to individuals no longer working at the server's site.
First draft of the autodiscovery internet draft for Atom.
Sam Ruby: The core idea is that, sites that are willing to trade a little CPU for a bandwidth savings, subsetting the feed that is returned on a GET based on the ETag that was provided on the request may make sense.
Update: As Sam suggests, this protocol is complex. Let me add too complex. I have to wonder why you don't just make the lastModDate the ETag and return only results whose lastModDate is more current than the If-None-Match header? Seems to be some detail in the Vary: ETag that offers little gain. Again, I wouldn't implement this unless I had a really active and popular feed. Just wondering.
Dare Obasanjo: The folks paying for the bandwidth that hosts Weblogs @ ASP.NET (the ASP.NET team not MSDN as Scoble incorrectly surmises) decided they had reached their limits and reduced the content of the feeds. It's basically a non-story. The only point of interest is that if they had announced this with enough warning internally folks would have advised them to turn on HTTP compression for HTTP 1.0 clients before resorting to crippling the RSS feeds. Act in haste, repent at leisure.
Randy: Dare nails it again!
Scripting: On Wednesay he posted that MSDN was limiting its RSS feed, dropping full text, because they were serving terabytes through RSS, and weren't happy with the economics. The solution is simple, and it follows the grain of the Web, follows the intent of micro-publishing, and it's the way every other blogging community works -- simply offer a feed for every blog.
Randy: They do provide a feed for every blog. That's not the problem.
Robert Scoble: It's not scalable when 10s of thousands of people start subscribing to thousands of separate RSS feeds and start pulling down those feeds every few minutes (default aggregator behavior is to pull down a feed every hour). Bandwidth usage was growing faster than MSDN's ability to pay for, or keep up with, the bandwidth. Terrabytes of bandwidth were being used up by RSS.
Randy: I've heard this too many times before. In fact, they have complained, they were told they were wrong and how to fix and they didn't fix it. I'm guessing they were just crying wolf. I guess we'll rehash this issue again. The wolf criers will be shown what they are doing wrong and the issue will eventually go away again. Or maybe we'll fix another problem that only exist when you write bad code.
The Atom syndication format w/ contributions from yours truly. The fourth paragraph of section 4.2.2 "atom:link" Element. Shown in italics.
If a feed's atom:link element with type="alternate" resolves to an HTML document, then that document SHOULD have a autodiscovery link element [Atom-autodiscovery] that reflects back to the feed.
Don Box: One thing I'm noticing is that the "object/XML impedance mismatch" is on the verge of being eclipsed by the "relational/XML impedance mismatch."
Randy: I think too much effort is put into criticizing existing technologies and not enough effort is put into developing real applications. Not that I mind, I seem to be the only person focused on verticals v. horizontals.
I hate the word platform.
I find slow comment systems very discouraging and pervasive (they have managed to spread themselves everywhere). What exactly is taking so long? Are they rebuilding the entire blog to account for one new comment? To the developers responsible for writing these slow comment systems, I have a small phrase for you.
Fork a thread once in awhile.
Shelley Powers: If a syndication feed supports complex or hierarchical categories but these are optional, and one weblog tools supports them, but another doesnât, using the syndication feed to export the data from one tool to the other will result in loss of data; using XML wonât improve this situation, or prevent the loss of data. Thatâs where the MT import/export format comes in handy at this time.
Randy: This statement is completely misleading. If tool A supports feature C and tool B doesn't support feature C, then no intermediary format will avoid loss of the feature when converting from A to B. That's is, MT's import/export format won't improve this situation or prevent the loss of data either. So, how does it come in handy? It doesn't.
All the blogs I run using my custom blogging tool support exporting to RSS out-of-the-box. This functionality is trivial and should be demanded in any blogging tool. Further, exporting to a proprietary format, especially one that is not XML, provides very little to zero benefit. Exporting to a proprietary XML format, provides little benefit. Exporting to a known XML format, is the only way to go. RSS is the best of breed intermediary format for converting between blogs. Please, don't invent me a new wheel, I've already got too many (Atom).
A great list of mostly HTML/CSS Webmaster tips from W3C. Oleg will like the last one.
Raw Blog: Atom has a Person Construct, relatively new and not yet finished. FOAF has one thatâs proving successful, and is already quite widely deployed, some of it in exactly the same application domain (quite a few RSS 1.0 feeds now include FOAF). So Iâve posted a page on the Wiki aiming to give a quick overview of FOAF for Atomistas unfamiliar with the stuff.
Randy: Here's an opportunity for Atomites to correctly underspecify. Point the atom:Person construct to the foaf:Person construct. Do it this way and let the FOAF people do the work. Then we have 100% compatibility rather than yet another wheel.
Haacked: If you installed SP1 for the .NET framework, you may notice that certain feeds are broken and return an HTTP Protocol Error. Dare looked into this and posted an explanation and workaround to the problem.
Dave Orchard: There's been some debate about whether or not the WS-I should profile schema or not. I lean a little bit towards that it's a good thing, but I think the rub will be whether Web services can use a reasonable subset. TimE posted some intial thoughts and then comments on profiling schema and I'll respond to some of his points.
Randy: Dave continues the profile XSD debate and Tim responds further.
Tim Ewald: Dave Orchard posted this piece in response to my posts on profiling XSD. He thinks I'm missng the point that this is an issue for users because they can't get interoperability across their implementation of schema. I disagree.
ZDNet: While not a comprehensive list, the following best practices from leading Fortune 500 companies and collected across numerous industries are a solid starting point to further protect enterprise resources with XML Web Services security.
5. Validate all messages
One simple step to prevent this problem is to use XML Schema Definitions (XSD) to validate both inbound and outbound data.
Randy: Les Dude!
Tim Ewald: Randy added that, from his perspective, subsetting XSD wouldn't help anyway. He believes that the major problem with XSD is not what it offers, but what it does not. My opinion on this point is pretty clear.
Randy: If Tim Ewald believes that XSD is too broad and Atomites believe that XSD is too narrow, then what we have is the middle compromise.
John Munsh: Hereâs the thing. One thing used to grind me to a halt when it came to certain types of programs in the old days. PARSING. I hated building anything that parsed and I still do. XML took 90% of that away so I donât have to worry about it, itâs a lingua franca that applications can speak and humans (barely) can as well if they need to. Once XML came along I started being able to use low level tools like DOM and SAX to address my parsing needs and now there are lots of excellent high level tools like JiBX, Digester, etc. which with some simple instruction about the structure of a document will go directly from the XML to fully populated objects in structures or collections. Iâd rather have the power to not waste time on the parsing and devote that instead to the programs functionality and user interface. And you must remember, that your needs are just for a couple of quote characters to be allowed. For someone else it will be allowing a stray ampersand, for another it will be allowing a high bit character (em-dash, open quote/close quote, etc.) into a document marked UTF-8. Once you allow for _all_ the odd little exceptions, the parser can turn out to be quite a nightmare indeed.
Randy: John says XML is about never having to worry about parsing. Parsing is solved. XML solved it. I agree.
Oleg Dulin: This whole valid-invalid BS is making RSS difficult for both reader makers and publishers.
Randy: Oleg thinks that the narrow definition of valid RSS is making life difficult. I think people should write better software.