How many Dutch use Last.FM?

48003. That's the number. Although I might be wrong, as I've just counted those users who have a couple of friends and who haven't set their profile to private.
A friend of a friend recently approached me with the question whether I knew how to crawl Last.FM's user database. I didn't, but they do have a nice API and writing such a crawler seemed like a welcome change. The code is on Bitbucket (GPLv2, Ruby, <300 LOC) and should be simple enough to adapt for similar projects.

The crawler is actually pretty simple: it will go through a list of open nodes (i.e. users) and construct a new set of to-be-visited nodes (i.e. the users' friends). This set will be used in the next iteration. That's not unlike Dijkstra's algorithm. To to gather a first set of users, it will fetch the users from a couple of groups. All friends of these members, which match our criteria (in this case being located in the Netherlands), will be checked in the next iteration.

The problem with this algorithm? It will only find those users who are either in one of the initial search groups, or are connected via a path to this set. We know from the Small-World Experiment that it is very unlikely that larger connected components exist which are not connected to each other, but I'm sure many users may not have friended anyone else, so these will not show up in the results. But the code was never meant to find an exhaustive set of users, it was meant to be small and simple, so this flaw is OK.

Fun fact: the diameter of the graph (i.e. the maximum over the shortest paths between two of its vertices), as defined by the Dutch users and their friend-connections, is &ge: 14 -- the crawler took 14 iterations to complete. Much more than I'd have imagined!