RSS, OPML and the XML platform.
Copyright 2022 World Readable
One of the most misunderstood elements in RSS is the <ttl>. It stands for Time To Live and it controls how often a feed should be refreshed. In this blog entry, I'll try to describe as much as possible about the element, so that others can make their own decisions on how to use it.
The term itself originates from the Internet Protocol (IP). In the protocol, TTL is the maximum number of hops or time that a packet can live before it is discarded. It's also used in DNS to describe the maximum time a DNS entry is cached before it's refreshed. I'm not a IP or DNS expert, so please don't take those definitions as gospel. At some point, TTL was also being used in peer-to-peer (P2P) networks.
The RSS specification has two sections that describe TTL. Under <channel> elements it says:
ttl stands for time to live. It's a number of minutes that indicates how long a channel can be cached before refreshing from the source.
A separate section on TTL says:
ttl stands for time to live. It's a number of minutes that indicates how long a channel can be cached before refreshing from the source. This makes it possible for RSS sources to be managed by a file-sharing network such as Gnutella.
Although Dave Winer's words are not gospel, they do help. Dave was the primary author of the RSS 2.0 specification. Dave said the following about TTL:
Suppose you have a copy of Gnutella running on your machine, and I have one running on my machine. My machine wants a copy of a certain feed, so it asks your machine if it has it. Your copy of Gnutella looks in its cache, finds a copy of the feed, takes the lastBuildDate, adds its ttl value. If the resulting time is greater than the current time, it says yes, I can give you that, otherwise it says no. If your Gnutella strikes out, if everyone it asks says no, it reads the feed from the feed's server.
What Dave seemed to have envisioned is that there would be RSS caches. These caches would be polling RSS feeds periodically and serving them more often. These caches would reduce the overall bandwidth demand on a popular feed.
The most popular feed cache on the Internet is Google's FeedFetcher. FeedFetcher fetches feeds periodically and serves them to several different subsystems and websites run by Google. The RSS Blog has 1,381 Google subscribers. Some of them use Google Reader, others use iGoogle. Imagine if Google fetched The RSS Blog every time a user wanted to see this blog in iGoogle or Google Reader. That would create a lot of web traffic. Instead, Google has one cache that polls the feed several times per day and serves the cached copy to all the subscribers and applications.
Now, we have all these caches out their polling feeds, but how do we tell them how often to poll for feeds? If I run a news service whose RSS feeds get recycled every 10 minutes, then I want feed caches to poll my RSS feed at least every 10 minutes, maybe even every 5 minutes. On the other hand, if my feed is updated exactly once per month and dissemination of the feed is not urgent, then there's no reason for these caches to be re-fetching my feed every hour. Another use case is an Amber Alert RSS feed. This is urgent information that isn't updated very often, so I don't want people looking at a feed that is 50 minutes old. Ambert Alerts must get disseminated quickly. In this case, I might set the TTL to 5 minutes.
The draft RSS Best Practices Profile has more to say about TTL than everybody else combined. Here's a cut job from the profile:
The channel's ttl element represents the feed's time to live: the maximum number of minutes to cache the data before an aggregator requests it again (optional).
By convention, most aggregators check an RSS feed for updates once an hour. The ttl, skipDays and skipHours elements provide a means for publishers to offer guidance regarding a feed's frequency of updates.
Support for ttl appears sparse among aggregators, in part due to disagreement over its meaning in the RSS specification.
Some aggregators use the TTL value as the minimum frequency of update checks on a feed. A test feed with a 90-minute TTL was checked more frequently than that by BlogBridge, Bloglines, FeedDemon, Google Reader, GreatNews, JetBrains Omea, My Yahoo, NewsGator Online and RSSBandit, which either indicates that TTL is used as a minimum or is ignored by the aggregator.
Other aggregators use TTL as the maximum frequency of update checks. Three aggregators won't check a feed more frequently than its TTL: BottomFeeder, Internet Explorer 7 and Opera 9.
Aggregators may check a feed for updates more frequently than its TTL. When an aggregator's cached data for a feed is older than the feed's TTL, the aggregator should request the feed again rather than rely on cached data.
This definition seems a bit too aggressive for me, but I'm generally OK with it too, but as a secondary source. Let's assume a feed cache has a very unpopular feed that is pulled at most once per month. There's no sense fetching that feed every hour, just because the TTL says so. Rather, when the feed is requested, the cache should examine when the feed was last pulled, add TTL minutes and determine if it should refresh from source before responding to the request, or respond with the cached copy.
In practice, I've seen several uses of the TTL. Many aggregators let the user determine how often a feed is polled and some of those will use the TTL as a default (or 60 minutes if not present). Some aggregators simply use the TTL as a hint to determine how often they are polled. The RSS draft profile is likely a good source for examples of these behaviors. Most aggregators simply ignore TTL and do nothing with it.
Make you own. TTL is rarely supported by both publishers and clients.
|Top 10 Articles|
FWIW, my Agg tries to walk the line between the two main interpretations.
It assumes that TTLs < 60 refer to caching. The Agg ignores them. It imposes a minimum refresh interval of 1 hour across the board.
It assumes that TTLs >= 60 are minimum refresh intervals. People can't specify a refresh interval lower than the TTL and the TTL is used as the default refresh interval.
Seems to work,
Andy Henderson, CITA
If there are aggregators that are truly known to treat TTL as the minimum refresh frequency then we should list those and only those. If we don't know of any, then that first statement is a complete fabrication.
I no longer own the Rmail data, so I'm no longer at liberty to provide stats. Sorry.
We might as well create a test feed with a TTL set to one minute, note that no aggregators treat that as the minimum update frequency, and then claim that all aggregators treat it as a maximum or ignore it. The statement would be true (for that particular test case), but misleading.
And we still have this statement: "Some aggregators use the TTL value as the minimum frequency of update checks on a feed." Nothing in the last part of the paragraph changes the meaning of that. So my question is, exactly which aggregators do that? Not which aggregators *might* do that. If we can't name more than one aggregator that is known to interpret TTL as a minimum update frequency (and I certainly can't) then that statement is simply not true.
FWIW I can name six aggregators that treat it as a maximum update frequency, one of which (GreatNews) the profile specifically claims does not support TTL as a maximum frequency.
Bottom line: I think this section needs further discussion, and it would be better if we continued on rss-public.