Everglades Correctional Institution is a Florida state prison. https://t.co/2PHpKfedqt pic.twitter.com/qsmnfjHfsA
— UnseenPlacesUSA (@UnseenPlacesUSA) December 16, 2016
https://twitter.com/UnseenPlacesUSA
[Github Link coming soon]
Concept
UnseenPlacesUSA is a Twitter bot and dataset containing the name, description, and the geographic coordinate of ‘unseen places’ in the United States. These places are locations that are unnoticed due to their remote location, or because we choose to put them out of mind. The bot Tweets these places with a sentence describing the location, a Google Maps link, and a satellite photo. The ‘unseen-ness’ of these locations is subjective. A prison is only unseen if you do not know anyone in the prison system. A power plant is only unnoticed if it is not in your neighborhood. Even so, I believe that most of these places are unfamiliar to many people. I hope that by recording the locations and making them more public, people can discover locations they have never heard of, but more importantly that neglected places will be re-considered. Finding an unseen place is an opportunity to consider why that place might be unseen, if its neglect is appropriate, and what that might say about us. A Twitter bot is an excellent way to perform this data. It allows the places to be considered individually, with a degree of measure. The bot also feels like it is sharing a secret, which is exciting. Pushing the places into a conversational sphere invites discussion and bringing these often remote locations into an intimate space, the tweet will be seen on someone’s phone or computer, contrasts both the size of the physical location and the scope of the systems that the locations represent.
Millstone Nuclear Power Plant is a nuclear power plant. https://t.co/sqTVATIz5A pic.twitter.com/m5lkMF5Cbg — UnseenPlacesUSA (@UnseenPlacesUSA) November 8, 2016
Implementation
The UnseenPlacesUSA Twitter bot is built on Node.js using Twit and the places data is stored in MongoDB. The main challenge of this project was collecting the data itself. I started the dataset with a list of unseen places that I thought would be interesting and then tried to find location data for those places. Most of the data comes from Wikipedia. Wikipedia contains many lists of locations, such as federal prisons, wind farms, and national monuments. I wrote a web scraper that uses node-scrapy. The scraper will run through a list of location names, search for the Wikipedia page, and then scrape the location data from that page. If there is no page or location data, the scraper will write the place name into a file, so I can look up the information manually later.
Other data comes from hobbyist sites, the missile silo data especially, and had to be converted from sexagesimal notation to decimal notation. That was done using formulas I found here. I also wrote a script that takes a street address and converts it into decimal coordinate notation, using the Google Maps API. This was particularly useful for datasets that only contain street addresses, such as the list of cattle feedlots I copied from the American Angus Association.
All of these scripts convert name, location, and description data into a document in my database. I chose to use a database instead of a json document to give this project room to grow in the future. I was also happy to have the chance to learn about using databases.
I had originally planned on setting up some kind of web interface for adding locations to the database, but after processing all the data I have collected so far it has become clear to me that most datasets re individual enough that it would be more work to write the code for a site that can handle them than to simply tweak the templates that I have already created.
United States Penitentiary Tucson is a US Federal Prison. https://t.co/w73FIXV0mP pic.twitter.com/u60NQ3H48J
— UnseenPlacesUSA (@UnseenPlacesUSA) November 8, 2016
Next Steps
In the short term, I would like to build a small dashboard for the dataset. I would like to be able to see what kind of places and how many are present in the dataset at a glance. I also think that it might be worth doing more research into the data I already have. For example, I am interested in differentiating between publically run state prisons and privately run state prisons.
Something else worth considering is how important completeness is for this project. It is not important to have an exact and complete list of all the landfills in New York State, for example, when the information is being tweeted. Each tweet is individual and is not considered as part of a whole. However, when the same information is shown on a map, missing information might become more visible and important. Data omissions also have meaning.
There are also potential new features for the bot. It would be interesting for the bot to be able to tell someone an unseen place, if they tweet a location at the bot. The bot could also be a good way for people to suggest locations they would like to add. Sharing on Twitter is a great way for the project to gain visibility.
I am also excited to explore what kind of future projects this data might lend itself to. I am personally interested to see what it looks like when I bring up all of the satellite images for a state. Will there be commonalities I had never noticed before?
And of course, there is always the ongoing work of finding more unseen places to add.