As some of you know, I've been tracking my book's Amazon.com sales rank recently, and comparing it to other books. I stopped doing this because the Amazon rank started spitting back a lot of duplicates, strange data, and stopped giving me easy access to the ranks of books near mine.
Based on comments by other bloggers, I learned that the Amazon rank is pretty unreliable unless you take constant measurements. It's a very fuzzy number: it does not tell you how many total books were sold, only the number of books sold during a specific time period, relative to other books.
This means that new books that sell 100 copies once will be ranked higher than older books that sell 50 copies every day. Plus, one single day without a sale will make your rank plummet. Very frustrating.
Anyway, to get some decent statistics I had to take matters into my own hands. I wrote a quick Python script to screen-scrape Amazon.com, extract my rank, and output it to a spreadsheet file. Nothin fancy, I just wanted a quick dump... Why Python? Because Java would require 10 times as much code, plus about 30 support JAR files from Apache Commons.
I scheduled it with cron to run every hour... so eventually I'll be able to determine a nice moving average of my sales rank. Ideally, these values should be inserted into my database... but I just don't think I'll be doing any advanced analysis beyond what's already in Excel... plus I have a good backup plan... so I took the lazy route.
Once I get a decent amount of data, I'll do something fun with it. I promise.