RSS, OPML and the XML platform.
Copyright 2003-5 Randy Charles Morin
Most every conversation reguarding Web bandwidth issues will eventually lead the conversation into a discussion about caching. Cool! Web caching. And you know what I know about cool? It rarely works. If it worked, then it would be uncool. Here's an argument presented by Sam Ruby on the RSS advisory board public mailing list.
the fact that the HTTP expires header (which *is* widely implemented) may make this value irrelevant - i.e., if you are behind a caching proxy, you can attempt to fetch as often as you like, but you will simply get back the same data
Sam is not the only person making this argument. It's a very popular argument on technical mailing list. What is wrong with this argument? It's the words MAY and IF. How many of you put a caching proxy between your Web server and the Internet? How many of you put caching proxy between your home computer and the Internet? Please lift up your hand. Anybody? Hello? Is this on? Web caching is cool!
Some smart people from the University of Maryland created a site called Feeds That Matter that use an algorigthm to determine the top blogs in various categories. They also wrote a research paper on the algorithm, which uses Bloglines data. The paper itself is the most interesting part of this project.
There research indicates that nearly half of the feeds in Bloglines are from the Blospot.com domain. In contrast nearly half of the feeds in Blogpulse are from the livejournal.com domain and Blogspot.com ranks only 4th. Different datasets indeed.
Looking more closely at the Feeds That Matter website, I selected the RSS category to determine how well their algorighm actually ranks blogs. The top two feeds in RSS were Lockergnome's RSS and Atom tips and Webloginc's The RSS Weblog. Lockergnome's site is an ad infested blog with little original content, mostly quoted content from other blogs, few reader comments, few inbound links and sporadic life. The Webloginc blog was recently shutdown and has been dormant for six months. Bloglines homepage ranks 4, which doesn't make much sense at all. Dave Winer's RSS Blog ranks 6 and has been offline for many months. Many of the other top feeds are also barely active or dormant. They should add a component to their algorithm that includes recent activity.
hey, this is an informal announcement that as of today jan and the board are gone. i've taken over tribe and you're going to see big changes fast like getting rid of this big stupid masthead and returning tribe to the users where it belongs.
Nate Anderson: The German-language version of Wikipedia will get an experimental overhaul in the next few weeks designed to cut down on vandalism, edit wars, and misinformation. How will it work? Through the magical power of trust. In the German system, any user will still be allowed to make edits to any article. Those edits won't show up in the live version of the site, though, until a registered user with a certain level of time and experience approves the changes.
Randy: Anybody can edit. Anybody can edit. Anybody can edit, but only the elite can publish. All animals are equal. All animals are equal. All animals are equal, but...
I was invited to a limited beta of Amazon EC2 (Elastic Compute Cloud). Sounds pretty cool. Like the storage service, it costs a minimal amount of money.
Just as Amazon Simple Storage Service (Amazon S3) enables storage in the cloud, Amazon EC2 enables "compute" in the cloud.
Reading my RSS today, I discovered that del.icio.us changed the keyformat of their private feeds. Don't they know I'm lazy and highly likely not to re-sub?
this private feed needs a key to unlock its contents. we have recently changed our key format. please get a new link to this feed with the new key
OK, I'm gonna make one or two friends a little upset here, but I keep seeing them make some very critical blogging mistakes and I wanted to enumerate them before I'm the only blogger left with any readers and Google juice [yes I'm stupid, check my last name]. Let's jump right in...
Bonus: This one I fail todo all the time. Not using a spellchecker and a grammar checker. In fact, I did it again. Ascribe it to laziness.
Mike Sansone gave The RSS Blog an honorable mention along with RSS Applied in his Blue Ribbon category for Feeds, RSS and OPML. The winner was Mark Woodman's inkBlots. That's some pretty cool company I'm hanging with. Thanks Mike! Mike also names winners in 10 other categories. He mentions a lot of pretty cool blogs. Worth checking them out to build out your reading list.
In a recent post, Robert Scoble stepped out of his Microsoft BOGU shadow and landed a knockout punch on Microsoft and Dare Obasanjo. The Scobleizer pointed to a list of live.com's most recently updated Spaces to note that most of them have no public blog entries. Robert may have been wrong on some points, but fact is, he revealed a major failing of live.com Spaces, which is that their blogs contain mostly non-public posts or no posts at all. He was also, not the first to point out of the weakness of Dare's counter arguments. Dare doesn't pull punches, but name calling doesn't make an argument. You have to wonder about Microsoft's face to the blogosphere with Scoble and Niall 404, as Dare's argumentative style may re-unearth the darker side of Microsoft.
Jason Goldman: This release of Blogger will also be the last I work on as I am leaving Google at the end of next week.
Randy: Jason was product manager of Google's Blogger and Google Blog Search
and is a member of the RSS Advisory Board. I'm sensing another great Web 2.0 startup in his future.
Update: Sorry, I incorrectly reported Jason Goldman as a member of the RSS Advisory Board. It's Jason Shellen of Google who is on the RSS Advisory Board.
Kevin Burton: There might have been 50 million blogs that have ever been created but there aren't 50 million blogs in active use.
Randy: Kevin is disputing David Sifry's claim that there are 50 million blogs. Unfortunately Kevin made four mistakes in his arguments.
James Holderness blogged his typically brilliant analysis of how aggregators detect duplicate RSS items. Must read for all RSS tool developers.
OK, so you buy a domain and web service hosting for one year ($200?). Then you quickly throw a webpage together that renders some XML as HTML. Put it on eBay and pocket $4,000? Repeat weekly and you are making $200,000 per year.
Chris Pirillo: If youâre a .INFO owner, sell it to a spammer and rebrand yourself - please. For goodness sake, letâs take a mulligan and pretend this whole .INFO thing never happened.
Forbes has written the screenplay to the up-coming Web 2.0 movie on PubSub. It's a good read and confirms most of what I've heard independently. I've had a few dealings with Bob Wyman (exchanged emails and blog posts) and Salim Ismail (met twice), the founders. Bob was very standoffish and denied a problem with the blogosphere ping even though the PubSub website, which relied on the ping, was mostly unfunctional. I can't wait for Salim's tell-all book.
Morale of the movie: Sometimes... ping is better than push.
Reviews from the blogosphere.
All month, I've been quietly advertising the fact that I've been approaching angel investors in the Toronto area about funding Rmail, my little project that now has 20,000 users. Rick Segal and Mark Evans were kind enough to write up Rmail. Thanks guys!
Update: Jeff Nolan joins the fun!
Rogers Cadenhead: The proposal to revise the RSS specification has passed 7-0 with RSS Advisory Board members Matthew Bookspan, Rogers Cadenhead, Randy Charles Morin, Greg Smith, LoÃÂ¯c Le Meur, Jenny Levine and Eric Lunt voting in favor.
1. The docs element refers to an outdated URL for the specification instead of the current URL.
For as long as the board operates, http://www.rssboard.org/rss-specification will be the permanent URL of the current version of the spec. The domain name is the property of the board, so it can move to a new host as needed in the future.
John Palfrey at Harvard told me that the URL http://www.rssboard.org/rss-specification is going to become the permanent URL of the original Harvard spec published in 2003 (not any of the board's subsequent derivations). You can find a copy of the original Harvard spec here.
2. The spec encourages people with questions about RSS to post them on the RSS2-Support mail list hosted by Sjoerd Visscher.
This list is no longer active, receiving more spam than RSS-related posts. Our own RSS-Public mailing list is a better place to seek help.
|466||LiveJournal / LiveJournal.com|
|461||Microsoft Spaces v1.1|
|198||CommunityServer 2.0 (Build: 60209.2598)|
The rest of the list is published on the Rmail blog.
Niall Kennedy: I am leaving Microsoft to start my own company.
It seems Michael Arrington is back to his old tricks. As I've mentioned before, I don't subscribe to TechCrunch, because Michael's opinion is blatantly tainted. Here's another example. Recently Michael wrote a review of Hubpages. In the comments, Jim Woolley points out that Hubpages has the same issues that caused Michael Arrington to say Feedpass Does Absolutely Nothing. Michael's response? "Ordinarily Iâd engage with you on it but the combination of verbal attacks from you and Randy was just too much." In other words, Michael's allowed to initiate an attack with blatant lies like Feedpass Does Absolutely Nothing, but when Jim disagrees with Michael, Michael blacklists Jim's business and Jim's opinions.
I'd also like to point out that Michael has agreed that the previous criticism of him were fair, but has never apoligized publicly for his Feedpass comments and now he says they were just too much.
Marjolein Hoekstra of CleverClogs passes along a very interesting link that shows a weakness in the RSS infrastructure that would be solved by the new Bloglines' Feed Access Control. Now, I'm unsure how this feed got into FeedShow, I don't seem to have an account with them. I assume they are pulling information from otherplace to seed there database, but I'm just guessing. Maybe I've subscribed to this private feed in another aggregator and since the authentication data is contained within the subscription URL, the credentials are made public somewhere along the route from del.icio.us to FeedShow. Unfortunately, since Feed Access Control is an extension of RSS, few providers are gonna implement this and the solution is not perfect, but it's better than nothing. This is far from the first time where private feeds become public via an online RSS aggregator, you may remember the Gmail/Bloglines problems.
Previous entries on Bloglines's Feed Access Control.
Update: Here's a list of common extensions in Rmail's 17166 RSS 2.0 feeds.
It's been almost a year since I last reported my top User-Agents as reported by FeedBurner. Here goes.
I limited the list of readers to those reporting 6 or more subscribers.
I pulled a list of feed URLs out of Rmail and ran some stats on them. There was 26,418 unique feed URLs. This is significantly higher than the number I report on the Rmail stats webpage, because it includes feeds that have been subscribed too, but are not currently subscribed too.
Now let's move onto feed type.
The blogosphere is buzzing about Bloglines new RSS extension.
Alex Barnett: The priorities seem wrong here - I don't see this step getting us any closer to getting better services when there are other much more fundamental issues that need solving.
Marshall Kirkpatrick: This is a great idea for which the time has come.
Scott Johnson: Its a fine gesture but thatâs all it is â a gesture â and like many gestures it wonât actually solve anything.
Danny Ayers: I can't see any problem with this proposal on technical grounds, the fact that its capability is likely to be misunderstood probably outweighs its potential benefit.
Randy: That's a lot of negativity. As someone who runs a search engine, I think I know the motivation here. Every few days, I get an email from somebody who is concerned that I'm reposting in my search results fragments from their blogs, journals, etc. I assume Bloglines gets a lot more than I. This extension gives Bloglines a way of responding to these complaints without having to create and maintain a blacklist of RSS feeds. It solves a concrete problem. Well done.
The engineers at Bloglines have created an RSS extension that allows publishers to specifically opt-out of republication, by simply adding the element as a child of the root <rss> node.
<rss version="2.0"xmlns:access="http://www.bloglines.com/about/specs/fac-1.0"> <access:restriction relationship="deny" />