October 23, 2003

Ann Arbor is both overrated and RSSed

I found out that it's surprisingly easy to make an RSS screen scraper with Template::Extract. I was able to whip up an RSS feed for ann arbor is overrated in about 15 minutes. This guide to Template::Extract and RSS had all the code I needed, although I modified it slightly to be a cron job instead of a CGI. Enjoy.

Posted by george at October 23, 2003 03:46 PM
Comments and TrackBacks

TrackBack URL: http://mt.gnerd.net/mt-gnerd-tb.cgi/125

Your link to the Template::Extract guide is misformed... "ref" instead of "href."

Posted by: Andy at October 23, 2003 05:43 PM

Thanks for spotting that, it's fixed now.

Posted by: George Hotelling at October 23, 2003 06:05 PM

If you dig XSLT, I've got another way to do it, mostly using XPath expressions on Tidy'd HTML:

http://www.decafbad.com/blog/tech/xsl_scraper.html

For most sites, I can usually throw together a scraper from a few paths in under 10 minutes.

Posted by: l.m.orchard at October 23, 2003 09:43 PM

Yeah, I'm often bad at responding to e-mail (I'm in the process of switching to a better webmail provider) and I'm also not sure how to set up an RSS feed with Diaryland. But I have comments on my blog now!

I wasn't sure what to think about this - then I remembered you'd asked about it and everything. I usually edit entries for grammar and wording about five times after they're posted, so you might end up with different versions. Anyhow, I'm glad you like the site.

best,
a.a.i.o.

Posted by: ann arbor is overrated at October 30, 2003 01:32 AM

The comments look good, I think they'll make your site even better. I did some research on setting up RSS on Diaryland but couldn't find anything.

I think most bloggers edit their entries a few times before deciding they like what they see. My script runs every 4 hours and builds an RSS feed off of what is currently on there. If you're in the process of updating it when that happens, the updates will appear 4 hours later. I can change the 4 hour update if you want...

You should also add this tag to your :
<link rel="alternate" type="application/rss+xml" title="RSS" href="http://george.hotelling.net/rss/annarborsucks.rss">
to tell RSS readers where to find your RSS feed.

Posted by: George at October 30, 2003 08:38 AM

Sorry, comments are closed.