RSS, OPML and the XML platform.
Copyright 2003-5 Randy Charles Morin
Dan Frommer: Meanwhile the site's [YouTube] bandwidth costs, which increase every time a visitor clicks on a video, may be approaching $1 million a month. [cut] Industry observers estimate that YouTube, which is streaming 40 million videos and 200 terabytes of data per day, may be paying between a tenth of a cent and half a cent per minute.
Chris Wetherell: In our never-ending quest to help you waste your day in better and richer ways, we've added support for video feeds that have Flash players. Mainly, this means you can watch videos from Google Video or YouTube directly in Reader.
Randy: Don't forget to add me 'randymorin' to your network. Then you can easily send me links and remember, the best link every month, gets a free book at Amazon.
BlogExplosion.com has been sold to Stephen Sartain and a group of private investors for a six figure amount. The deal was brokered by Jeremy Wright.
This is the biggest validation of Web 2.0 that I know of. For those that knew of BlogExplosion it was a hit trading exchange. Basically, you read other peoples blogs and in return you get an equivalent amount of reads on your blogs from other people as desperate as you. I tried it out for a few days sometime last year. I realized, the only people that would use it are those that write the worst content and are completely desperate for traffic. If that sold for 6 figures, then we are definitely in a bubble of sorts.
By Lore SjÃ¶berg on Wired...
Is it true that anyone can contribute?
Sure, Wikipedia is absolutely open to absolutely anyone contributing to absolutely anything! As long as you haven't been banned, or the article you're contributing to hasn't been locked, or there isn't a group of people waiting to delete anything you write, or you don't make the same change more than three times in one day, or the subject of the article hasn't decided to send scary lawyer letters to Wikipedia, or you haven't pissed Jimbo Wales off real bad. It's all about freedom.
PR: Microsoft Corp. today announced the spinout of a new social networking technology, developed by Microsoft Research, to create a new Silicon Valley startup, Wallop Inc. Wallop, whose aim is to deliver the next generation of social computing.
Randy: Two years ago, when I first saw MyWallop.com, I was amazed. It was by far the best social computing Website I had ever seen. Two years later and MyWallop hasn't changed much. Now MyWallop wants to get out of Beta and go live later this year. They may already have missed the boat.
Bob Wyman is championing the cause of the OPML ping infrastructure and suggesting that the RSS ping infrastructure is working just fine.
Bob Wyman: Pito Salas of BlogBridge recently wrote that they have implemented "Reading List Pings." This brings to OPML the same push-based bandwidth saving technology pioneered and proven in the realm of RSS/Atom syndication.
I cannot believe at this time that anybody continues to believe that the blogosphere ping infrastructure is working. Bob has staked his company on the ping infrastructure and this has resulted in PubSub's indexer breaking repeatedly over the last year. I blog countless (estimate a dozen) times per day and ping PubSub with every new blog entry, yet PubSub has indexed my Website one third of the days of the last 30. That's 12x30=360 pings in a month and only 10 of those 30 days I got indexed. That's somewhere between 240 and 350 missed pings out of 360. This is the technology proven in the realm of RSS/Atom syndication? I think not.
Alec Saunders: Last August I began a serious effort to try to increase the traffic to my blog, after switching from Radio (where it had been for three years) to WordPress. Since then, Iâve gone from an average of 216 visitors per day, to 3400. March was the first month I had more than 100,000 visits, and over a half million hits. At Barcamp Ottawa, on Saturday, several people asked about this, and I promised to put together a "How to" post.
Randy: Alec writes an excellent read on how to get your blog noticed. And he compliments me too. Thanks Alec!
CNN: Five teenage boys accused of plotting a shooting rampage at their high school on the anniversary of the Columbine massacre were arrested Thursday after a message authorities said warned of a gun attack appeared on the Web site MySpace.com.
Randy: Anybody want to argue how terrible MySpace is?
After two years of polling RSS feeds won't scale debates that have ended with the naysayers turning into mutes, the debate is now whether polling OPML files won't scale. SHUT UP!
Dare Obasanjo: Dave Sifry plays fast and lose with language by interchangeably using blogosphere and number of blogs Technorati is tracking. There is a big difference between the two but unfortunately many people seem to fail at critical thinking and repeat Technorati's numbers as gospel. It's now general knowledge that services like MySpace and MSN Spaces have more blogs/users than Technorati tracks overall. [cut] I suspect that the number of blogs out there is closer to 100 million.
Randy: The thing about Dare is ... he doesn't pull punches. I like that!
Scott Karp of Publishing 2.0 noticed that Technorati results were changing radically, that Dave Winer was nowhere to be found and was being replaced by MSN Spaces blogs. The fear, of course, is that Technorati was compromised by SERP (search engine result page) spammers. Chris Edwards followed up with some actual analysis and determined that it wasn't SERP spam, but rather some lucky MSN Spaces bloggers that snuck thru a hole in Technorati's engine and other very explainable occurrences. In other words, Technorati is not compromised by rather broken, but this isn't a new thing. Dare Obasanjo noticed a bunch of MSN Spaces in the Technorati index over a month ago. Similar occurrences have also been noted more than two months ago and as Domiziano Galia explained, in the comments for The RSS Blog, "every blog on MSN Spaces has a box where are shown links to other MSN Spaces blogs. This is not a choice from the blogger, but an automatism of the platform, so that every blog gets free unjustified links."
The problem is that Technorati does not index blog entries, but rather webpages. Technorati often reports referrers to my own blogs where the blogger is simply listing me in his sidebar blogroll. Actually, it wouldn't be so bad if I got one referrer from that sidebar link, but Technorati will often repeatedly give me referrers every time that blogger writes a new entry. So, we have a big problem. Is MSN Spaces gaming the Technorati index? Guess what, they are not. MSN Spaces is marking these automated links with NOFOLLOW attributes, as they are suppose to. I then went to the Technorati 100 and checked if they were including links with this attribute and found out they were. Why is Technorati including NOFOLLOW links in their rankings? Technorati drafted a specification of the NOFOLLOW standard more than a year ago. More than six months ago, David Sifry wrote "Early this year, a number of search engines including Technorati adopted the rel='nofollow' microformat." David? Are you sure? Cause all the evidence indicates that Technorati is still ignoring NOFOLLOW attributes. David, what happened to "we have been battling the spam situation in a significant way for about 2 months?"
While I'm beating up on Technorati, let me also point out that World Live Web is no longer live and mostly cluttered with link spam. Let me show you an example. Every morning, I do a search for people linking to me. Here's a screenshot of what I see most mornings.
According to Technorati, Memeorandum linked to me 1 and 2 hours ago, but as you can see from the titles, the links are from several months ago. This is actually better than usual. Usually, the first few links are the overnight link spam attack on Blogspot.
Now, don't get me wrong, Technorati is actually not that much worse than the other blogosphere search engines, although they are clearly now the worst. Similar searches on Feedster give me nothing but smog (blogosphere spam) and posts from my own blogs. Bloglines Citations tends to be good at eliminating the smog, but more often than not, it reports an error and false positives. PubSub too is good at finding results, but also good at finding smog and often broken. BlogPulse has been better, but I've noticed more and more splogs in the results. IceRocket and Google Blog Search have the least smog and generally get me some good results. I'd say they rank #1 and #2, with IceRocket in the #1 spot based on Google being the heart of the splogosphere. I actually use IceRocket and Google Blog Search throughout the day, whereas I might hit the others once or twice on the oft chance they pickup a referrer not reported by IceRocket or Google.
Well, that's my rant. Guys pick up the slack!
Previous State of Blogosphere Search articles.
Today, I stumbled onto a great new RSS extension written by Buy.com to enhance RSS with more detailed product data. It's about 2 weeks old and IMHO, one of the best extensions I've seen to-date. They already have an array of RSS feeds using the extension that list their top selling products. The feeds don't validate (even the sample in their RSS extension documentation), but the mistakes shouldn't affect usage much.
This is a note to whomever is in charge of the Harvard website that publishes the RSS specification. Please upgrade the server to handle its current load or free RSS from your apathy.
Please, if you're concerned with the non-responsiveness of the Hardvard website that publishes the RSS specification, then please re-blog this.
Joe Gregorio: There are some minor rough spots but it looks very good.
Richard MacManus: Why Google is extending RSS.
Vincent of A Feed Is Born: This protocol could be the base of all new Google services from now on.
Deeje Cooley: As a client developer, my first question is... when/does this work with Blogger? And my next question is, who else is on board? SixApart? Roller? Wordpress?
Matt of HowRadical: Gdata. Sounds like a 2-way, push/pull version of RSS.
Maurice Cook: GData means that the walled garden of Google's applications is coming to an end
Steven Livingstone-Perez: If Microsoft had done this they would be hammered in the media..
Karl Martino: Google does an end run around the RSS and Atom war. GData, Google's new API to read and write from the web, combines elements of both.
Mark McLaren: This all sounds a bit like the beginnings of a Google powered enterprise portal to me.
Reto Meier: The protocol and APIs are fully extensible so expect to see GMail, RSS reader, news, bookmarks, blog, and search APIs roll out using the GData API.
Amit Agarwal: Get your creative juices flowing. Build calendars on your site using Google Calendar Data API.
Randy: This is simply a snashot of a few trees in a forest. I think I could've quoted 100 intriguing statements. What that means to me? GData is disruptive.
Bob Wyman: There has been all sorts of commentary recently about the new world of Web 2.0 applications and mashups built on "free" API's. Somewhere in all the excitement, people seem to have forgotten Robert Heinlein's famous warning:TANSTAAFL ("There Ain't No Such Thing As A Free Lunch..."). While mashups based on free API use are often innovative and sometimes delightful, the providers of the APIs being exploited in the mashups will eventually need to find some way to monetize the services they are providing or to at least limit the cost of providing these valuable services.
Randy: Or people could start building free APIs with business models built-in. The problem is that most APIs are developed without reguard to financial scalability, by the hacker without any business sense.
Over the last week, I've been getting the occasional meeting notification directly in Gmail and accepting the event places the event in my Google Calendar and returns a confirmation to the inviter. Gmail+Google Calendar is Outlook Web Access without the $Exchange$ server. I even created my own event, wow it was easy, and invited 3 friends. Google Calendar does to Yahoo! Calendar what Gmail did to the old Yahoo! Mail. It made it obsolete. Now, if only I could import my Yahoo! data into Google.
Oh ya! And they have an API.
Dave Winer: I just heard a rumor that AOL is going to challenge MySpace, "head on," to be announced in approximately two weeks.
Randy: The rumour was confirmed on a AOL blog called downloadsquad. I hardly doubt they will live up to the MySpace killer promise. What you have to remember is that MySpace really isn't all that good. MySpace was at the right place at the right time and word-of-mouth in high schools did the rest.
Mark Cuban: Icerocket.com has added a Myspace tab. Any search defaults to myspace.com and you can search for anything and everything. Its fast. Its easy. Its not infallible. Its not a replacement for keeping an eye out for what your kids are doing when they surf the net, but it might make you as a parent feel a little more comfortable and arm you with a tool that can only help.
Randy: Very cool! Now why didn't I think of this, we all say. Not only is this great for parents looking out for their kids, it's also great for kids looking for their friends.
Randy: Mike does a great job of showing you have to design a more useful and eye pleasing MySpace homeplace. Check-it-out!
Ten days ago, I reported that Blogware was blocking Google referrers. I even sent an email to a friend who works at Tucows. No response. Very weird. I can't image how many bloggers using Blogware are being robbed of readers from both Google search and Google reader. Some companies are using Blogware for their corporate blog and unknowingly rejecting potential clients. I even noticed that some Blogware blogs were removed from the Google index, most likely related to this flaw. To replicate, simply subscribe to the Qumana blog in Google reader and click thru to one of their blog entries. You'll get...
access from http://www.google.com/reader/view/ has been denied
David Sifry: I continue to marvel at it, but the blogosphere continues to grow at a quickening pace. Technorati currently tracks 35.3 Million weblogs, and the blogosphere we track continues to double about every 6 months.
Randy: And for the next month everybody will be saying the blogosphere is doubling every 6 months. Just one problem. Technorati recently added MySpace blogs to those they are tracking, a.k.a. another variable in the mix that means you can't equate growth in the Technorati index to growth in the blogosphere. Further, the blogosphere is made up of well over 100 million weblogs, so effectively Technorati is tracking what they are capable of tracking and not nearly anything. So, if you hear in the next week somebody saying "the blogosphere doubled in 6 months cause Sifry said so", then feel free to call them a liar. And if Sifry could retitle his "State of the Blogosphere" as "Technorati's inability to keep pace with the blogosphere", then I'd feel we're being more honest with ourselves.
Last week, Google finally launched Google Calendar. I've long been a user of Yahoo! Calendar and long hated thick-client calendar applications like Outlook. Google Calendar is a large step forward in online calendars. Much easier to use. Very responsive. Support for Atom feeds. Just one problem. How do I export the calendar data from my Yahoo! and import that into Google? I've built up a large set of over the years that I've been using Yahoo! It would take several hours to re-input the data into Google.
Update: Microsoft intends to keep up with the Jones.
Matthew Mullenweg: Iâm not even talking about deciding they can change the world by decree. (Which has already been addressed.) The latest in their line of enlightened changes is that the author of the Well-formed Web spec has changed the capitializition of the
wfw:commentRSS element at some unknown point to lowercase
Rss. This arbitrary decision has been codified by the validator, which now reports the millions and millions of feeds that use the previously correct capitialization as invalid. [cut] But wait, thereâs more. âIn addition, this feed has an issue that may cause problems for some users.â Theyâve also started marking all uses of
content:encoded as potentially causing problems, which is funny because it actually avoids a ton of problems and (again) people have been using it in RSS 2.0 feeds for 3+ years now, and I even asked Dave Winer about it in the past and he said that was fine. Their documentation on the topic seems more geared toward instilling fear, uncertainty, and doubt in RSS 2.0 than addressing the reason theyâve decided to start warning about this element.
Randy: I'm going to both agree with Matt and take the blame. In my persuit to avoid the noise on the RSS mailing list, I failed to address these issues when raised. Some, have taking that the random chatter on the RSS mailing list is somehow the rule of the day. It's NOT!
PR: Combined spending on blog, podcast and RSS advertising bolted 198.4% to $20.4 million in 2005, and is expected to grow another 144.9% to $49.8 million in 2006.
Davis Janowski: You'll find dozens for freeâWeb-based ones, add-ins, extensions, sidebars, Microsoft Outlook-integrated plug-ins, even standalone software (and plenty more are coming). So why buy one? Because none of the free alternatives offer the same level of organization and filtering abilitiesâor breadth of featuresâas FeedDemon.
Niall Kennedy, the x-Technorati evangelist ended his carreer courting adventure with a job at Microsoft. Congrats!
I've notice in the last few months that Wikipedia is expending a lot of energy internalizing the links on their Website. This became too obvious when they deleted the List_of_news_aggregators Wikipage. The reasons given include...
The page was clearly not external link spam. This is not the first recent instance where I've seen pages get deleted because of external links that I thought were entirely useful. If Wikipedia is gonna start removing outbound links, then I'm gonna stop creating inbound links. Also, I'm going to grab the last good version of that page and host it on this blog.
Update: I found out that once a Wikipedia article is deleted that you can't get at the history. A lot of people, including myself, put a lot of effort into maintaining that list. Now because of the Wikipedia zealots, all the information appears to be lost forever. Thus I changed the title of this blog entry from Wikipedia internalizing to Wikipedia sucks! This is no longer a community project, it's a bureaucratic black-hole. If anybody can recover that content and send it to me via email or post it in my comments, then I'd be greatly appreciative.
Nico Hines: Teenagers are so obsessed with the site [http://www.faceparty.com/] that last year it saw more traffic than Yahoo's email service, Tesco's website and Amazon. Only eBay, Google and Hotmail are viewed more often in Britain. The site allows its 6 million members to send each other messages.
Walter of the Microsoft Team RSS has posted an extensive article on techniques supported by the Windows RSS platform for reducing bandwidth. I was glad to see they referenced my HowTo RSS Feed State article and implemented most everything I wrote about. They even implemented the techniques I mentionned, but wrote off as bad ideas because nobody supports them.
PubSub seems to have stalled. If you check the site-stats for the kbafe.com domain, then you'll find zero in-links for the last four days and zero out-links for the last three weeks. I can maybe buy that nobody is linking to me :-), but I'm pretty certain I'm linking to a lot of people. What's really weird is that Friday it reported 31 blog entries, but no outbound links. A couple weeks ago, I had a random encounter with Salim Ismail, a founder of PubSub and he gave the impression that things were gonna get better. Unfortunately, they've gotten even worse. On the back of Technorati now reporting an all-time low of nothingness and the near death experience of Feedster, it's become apparent that the blogosphere infrastructure has completely collapsed.
Now don't get me wrong, the problems with PubSub are not restricted to my domain. PubSub is reporting zero links to bbc.co.uk in three of the last four days. And according to PubSub LinkRanks, bbc.co.uk is the #1 linked-to Website.
At a recent geek event, I was talking to a product manager at one of the promising new Website start-ups. Although they were growing, they weren't growing at the pace of the successful Web 2.0 start-ups like Flickr or YouTube. The product manager was asking for ideas on jump starting his Website. I asked, if he considered an API, where third party developers could tap into their database (their database is very impressive). He asked what I meant. I tried to explain the benefit of open APIs and how they could be used to recruit new users. It then clued into him that they already do this. You see, a user could go to their site and create some data, then they could post that data on their blog. NO! I exclaimed. What if a blog tool could use your API to create some data in your database and post it to the blog. He responded, yes, we already do that and gave me a do you know what we do. He couldn't make the leap that Website could work seamlessly together and share user data, rather than the user having to cut-n-paste HTML from one Website to another. BTW, the well-known Web 2.0 start-up is often associated with the phrase "we need APIs for everything." I really don't think most Web 2.0 techies and BAs understand the power of APIs beyond RSS.
In a second example, I discovered an API for another Web 2.0 startup. Investigating the API, I found it lacked one important ingredient. It didn't use XML. It used an alternate well-known data encoding, but now I had to write a new encoder for my data, rather than using my trusty old DOM parser. Had the API been XML, then I'd be already using it. Now, there's a risk, that I might never use it.
LeftLaneNews: Honda will soon add Google Earth to its âinternavi Premium Clubâ navigation service in Japan.
In another April Fool's joke that I unfortunately missed till just now, Meetro announced they had acquired Friendster.
PR: Meetro, the world leader in Instant Messaging software announced today it has entered into an agreement to acquire Friendster, one of many social software sites for keeping in touch with friends and hooking up with new people.
Wikipedia has a Wiki page List of blogging terms. A couple new terms, I hadn't heard before (and like).
A few months ago, I ran into this situation where I got a 403 error when I clicked thru to a friend's blog from a Google search. I refreshed, the blog loaded and I forgot about it until about 2 weeks ago. I ran into the same error, again via click-thru to another friend's blog from a Google search. The error? HTTP 403 with the text "access from http://www.google.com... has been denied". I removed the query parameters from the Google link. I was gonna tell the person, but a refresh cleared the problem and I never recalled the error until today. Again, via click-thru to another blog from a Google search, I get HTTP 403 access from http://www.google.com... has been denied. I tied all three incidents together and realized they were all running Blogware. I then Googled 403 Google Blogware and found that others have experienced this problem as far back as early January. I confirmed the problem on multiple browsers; IE6 and Firefox. The problem is not regional either, as Coolz0r confirmed similar errors in Europe. The bug seems to be caused by referrer SPAM filters on steroids. I sent an email to a friend who works at Tucows.
It took 53 minutes for Mihai Parparita of Google to track down my problem and move it into the next release. He also sent me my OPML reading list via private email. A second employee, Jason Shellen was also all over my blog in just over one hour and moving into repair mode. It felt like a billion dollar company was at my command. That's why I use Gmail. That's why I use AdSense. That's why I use Google Search. That's why I use AdWords.
Judging from my subscriber reporting, a lot of people have been checking out NewsAlloy, an ajax-based RSS reader. So today, I checked it out too. It was pretty awesome. So good, that I decided to switch from Google Reader to NewsAlloy for awhile to give it a real test.
Niall Kennedy: The latest version of Google's Firefox toolbar adds support for feed subscriptions to online aggregators with just one click. [cut] I was able to choose between Bloglines, Google Personalized homepage or Google Reader, My Yahoo!, NewsGator Online, and Pluck.
Although I'm not the biggest fan of the quality of Technorati results, I find that it his a new low today. When doing a search for my domain, I'm noticing that Technorati search results have now become completely useless. Here's the results I was presented with today. All of these results are reported in the last 12 hours.
Six splogs (Google Juice removed). Six are not even blogs. Only three are actually non-splog blogs. But all three are blog entries from more than 3 months ago. Dave, what's up?
Q: What do you think differentiates Feedlounge from other web-based readers like Rojo and Bloglines?
A: I think the primary difference is the powerful and efficient reading interface FeedLounge delivers.
Randy: I'd like to add, that from my own feel of the market, more RSS aggregators support Atom 0.3 than Atom 1.0, but I don't have any data to back that up. That said, you can be certain that support for Atom 1.0 is increasing. Whether it will ever rival RSS 2.0 is another story. Is Atom 0.3 invalid? Let's examine the version number. It starts with a zero. RSS 0.91 also starts with a zero. If I were building software and I published a format whose version number started with a zero when a newer version that starts with a positive integer is available, then I'd have some explaining to do. Is Atom 0.3 or RSS 0.91 invalid? NO! Are they deprecated? YES! Should you publish them? NO! Do RSS vendors have the right to say they are not support those version without repercussion? YES. IMHO.
In fact, I think I'm gonna remove support for Atom 0.3 and RSS 0.9x from Rmail. Sorry Blogspot. Get with the plan.
About: ://URLFAN is an evolving experiment designed to discover what websites the blogosphere is discussing all in real time. It does this by cultivating the content of thousands of RSS feeds and parsing billions of pieces of information.
In the last few days, Technorati began indexing MySpace blogs. Aaron Brazell says "The quality of the Technorati results has diminished in my eyes" and "99% of MySpace blogs are not blogs". I have to disagree. Blogs are blogs. Whether they have quality content or not, does not make them any less blogs. A newspaper that contains crappy articles is still a newspaper. A bad book is still a book. Does indexing MySpace blogs reduce the quality of Technorati? I don't see how it can, I've not seen any quality there in quite some time.
It appears that Microsoft is getting involved in the corporate blogger swap between Yahoo and Google. The latest news has Robert Scoble and Jeremy Zawodny moving from Microsoft and Yahoo! to Google. Going the other way are 80% of Matt Cutts to Yahoo! + 2546 computers and the other 20% of Matt to Microsoft. Does this mean one Jeremy is worth 4 Scobles and 2546 computers?
Last October and January, I wrote pieces on the state of the splogosphere, that is the ability of our blog infrastructure to handle SPAM, SPAM blogs, blog comment SPAM and spings. In these pieces, I talked mostly about the splog problem that was rooted in Google's Blogspot hosting service and the inability of blog search engines to filter the splogging noise.
Here we are six months later and nothing much has changed. Splogs are still everywhere and the search engines are struggling with result pages that are littered with splogs. Let's examine a few blogosphere search engines and score the amount of splog found compared to useful results. Let's compare search results for my primary domain; kbcafe.com.
Currently, I'm using a combination of BlogPulse, IceRocket and Google blog search. All three do a good job of filtering splogs and still report a lot of new referrers.
One things that has changed is the preferred splogging framework. Six months ago, almost every splog was found on Blogspot, but thanks to a lot of effort on Google's part, this is no longer true. Sploggers prefer the self hosted Wordpress platform. Don't get me wrong, there's still lots of splogs on Blogspot, but the search engines and Google have teamed to reduce the number of those splogs that are appearing in the blog search result pages.
There's a new evil in the blogosphere and that's blog comment SPAM. The amount of blog comment SPAM is not only increasing, but the spammers are writing relevant comments that are less likely to get removed by the blog's author. Some blogging platforms are simply inadequate at stopping blog comment spam. I have a Blogspirit test blog and if you check the right sidebar, the comments are dominated by blog comment spam and I really have no idea how to stop this. I've even tried to disable comments, but the software seems to be broken in this regard.
Another new evil in the blogosphere is spings. Spings are blogosphere pings on behalf of splogs (or fake blogs). The end user doesn't really see this problem, but search engines like Technorati do. David Sifry is reporting that the majority of blogosphere pings are actually spings.
Conclusion? We're not getting anywhere. Splogs are devaluing the blogosphere, as much as email SPAM is devaluing email. The problem is that governments move at a slower speed than the Internet. A spammer or splogger is a millionaire before the authorities know how to deal with them. The solution must come from the private sector.