📅 October 23, 2011 • 🕐 14:09 • 🏷 Blog • 👁 258

For a small dataviz experiment I wanted to create maps of books, by extracting locations (cities, countries, continents, whatever is mentioned in the text) and drawing these on a map. I used the Stanford Named Entity Recognizer to extract the locations from two books: the Bible and Herman Melville’s Moby Dick. I then wrote a small script in python to retrieve the latitude and longitude of the locations using the Google Geocoding API, throw it all in a csv-file and draw it on a map using GeoCommons. I also included an ascending date to the locations, in order to allow an animated visualization of the extracted locations in GeoCommons.

The darker a circle, the more mentions it got (I set the circles opacity to 10%, so overlaying circles automatically darken).  There were some issues regarding false positives (Stanford NER identifying persons as locations). And while I didn’t really know what to expect, I was glad to see that the major clusters in both maps did seem to make sense (Nantucket in Moby Dick, around Jerusalem in the Bible). The Bible geomap shows that a lot of places (particularly in the United States) seem to be named after Biblical locations and names. The cluster in the West Coast of the US seems as big as the Middle Eastern cluster, however once you zoom in it becomes clear that it is less tightly packed. Moby Dick’s geomap shows a lot of locations around coastal areas, which seems to make sense, it also mentions a lot of oceans and seas.

Here is the interactive map on GeoCommons, which allows you to animate the results, and click around the locations:

View map on GeoCommons