domenica 10 luglio 2016

How to map the photos of your city like OldNYC does

Some months ago I was looking for a way to render on a map the historical photos of Ravenna collected in this Facebook Page. I stumbled upon the OldNYC project and thought it was perfect for my needs! Therefore I forked the project on Github, worked on it and this is the result : www.oldra.it.
In this tutorial which is intended for programmers, I’ll try to explain which are the main steps for reusing the code of OldNYC (written by @danvdk) , and setup the photo map of your city.






At this moment my system has a backend hosted in a Ubuntu virtual machine on my PC and a frontend (a static site) hosted in the cloud. In order to collect the newer posts of the Facebook Page the backend harvests from Facebook and re-generates once a week all of the static pages of the static site and sends them to the cloud to be served by www.oldra.it.

The system

The developement environment and the backend run on a Ubuntu Server 16.04 virtual machine.
  1. First of all I forked the 2 Github repos https://github.com/danvk/oldnyc for the backend and https://github.com/oldnyc/oldnyc.github.io for the static site, then I cloned them on my server with the git clone command (git clone git://github.com/danvk/oldnyc.git git clone git://github.com/danvk/oldnyc.github.io.git). After that my home directory looks like that :


2. I installed python2.7 (do not install python 3) :  apt-get install python2.7
3. I installed virtualenv : apt-get install virtualenv
4. Then I followed the instructions written by @danvdk to setup the environment :
cd oldnyc
virtualenv env
source env/bin/activate
pip install -r requirements.txt
     I did not use Google App Engine therefore did not run ./develop.py
     I installed the numpy package : pip install numpy
5. Created some directories necessary for the backend to work :
   mkdir oldnyc/gecocache
   mkdir oldnyc/images
   mkdir oldnyc/thumbnails
6. Fot testing purposes Installed an Apache web server : apt-get install apache2
   Configured it to listen on localhost and serve the content of the directory oldnyc.github.io ,
which is the static site that will be hosted in the cloud.



The backend

It took some time for me to understand the wirings of the system. I tried to look at the system as a black-box without the will to deeply understand how the system works. I just tried to understand which were the stages of the processing and which were the input and output of each stage. The first input of the process is the csv file containing the Milstein collection records, the final output is the static site.
Dan software makes many interesting things I did not use for my project like ocr recognition and geocoding based on boroughs, I focused just on the things I needed : scraping, geocoding and presentation.
I wrote down a logical schema that was useful to me to have a view of the process , the files involved and the commands invoked, sorry if this is not crystal clear but it was not intended for teaching purposes; it lacks some stages like photo fetching and thumbnail generation.


The sketch of the logical schema


These are the main stages of the process :
  • The command csv-to-record.py transforms the input file Milstein.csv into a serialized Python object Record.pickle, look inside the Record.py class to get the mapping of the csv columns on the object attributes. In my case I substitued Milstein.csv with facebook-posts.csv.
  • The expand-pickle.py command transforms Record.pickle which represents multiple photos fot each record to Photos.pickle which is one record per photo. As I did not start from Milstein.csv file , in my case Records.pickle is already one record per photo, therefore in the rest of the process I used Records.pickle instead of Photos.pickle
  • The generate-geocodes.py command creates files that associate photos to latitude and longitude, locations.txt and nyc-lat-lons-ny.jsThat command is enough flexible to accept different kind of geocoder as an argument. For my purpose I created the ravenna_coder.py which is tailored for the city of Ravenna and extracts the address from the image description and passes it to the Google geocoder API. Geocoder has a cache in /geocache that allows not to query Google for already gecoded addresses.
  • The cluster-locations.py command clusters the locations of the photos which are next to each other and outputs the lat-lon-map.txt which allow to group multiple photos in a single point on on the map.
  • The generate-static-site.py command creates the files needed for the static site to work.


I modified the code of many classes in order to fit my needs and collected all the commands needed to go from Faccebook scraping to site publishing in just one script : do_all.sh :

./facebook_fetcher.py  (fetches the posts from Facebook)

cd nyc

./csv-to-record-facebook.py (generates Record.pickle from the posts)

cd ..

./generate-geocodes.py --coders ravenna_coder --pickle_path nyc/records.pickle --output_format locations.txt --geocode -n > locations.txt

./cluster-locations.py locations.txt > lat-lon-map.txt

./generate-geocodes.py --coders ravenna_coder --pickle_path nyc/records.pickle --lat_lon_map lat-lon-map.txt --output_format lat-lons-ny.js --geocode > viewer/static/js/nyc-lat-lons-ny.js

cp ./nyc/records.pickle records.pickle (temporarily copies records.pickle needed by image_fetcher and image_thumbnailer)

./image_fetcher.py -n 12000  (downloads the images in order to create the thumbnails)

./image_thumbnailer.py     (creates the thumbnails)

./extract-sizes.py  > nyc-image-sizes.txt

rm records.pickle (removes the temp copy)

cd nyc

./generate_popular.py  (collects the most popular photos based on number of facebook shares to be showed in the right columns of the website)

./generate_fb_links.py (creates fb_links.js which contains the URLS to the fb photos, in OldNYC it is not necessary a separate file with URLS because URLS are composed adding the image ID to a base address pointing to NYPL image repository)

cd ..

cd ../oldnyc.github.io
git add .

git commit -m 'photo update' (commits the github changes 'cause ./generate_static_site.py requires a clean state of the repo)

cd ../oldnyc

./generate_static_site.py

cp ./thumbnails/* ../oldnyc.github.io/thumb/

cp ./viewer/static/js/* ../oldnyc.github.io/js/

cd ../oldnyc.github.io/

./update-js-bundle.sh  (merges the js files needed by the forntend, the list of files to merge is in files.txt)

sudo systemctl start apache2.service (restarts the testing webserver to refresh the site)


After running the do_all.sh script the static site oldnyc.github.io is ready to be uploaded to your hosting provider.


The frontend - static site


Even if the variable part is created by the backend with the do_all.sh script there are a few things to change in order to use the static site oldnyc.github.io.
  • Create a new facebook app and use the app id in index.html and , if you scrape Facebook, in facebook_fetcher.py
  • Get a Google Map API Key. You have to insert the key in index.html and viewer.js . Rembember to do the domain verification on the Google Console, this allows your website to call Google API.
  • If you want to get feedback from the users on the photos then customize feedback.js and open a Firebase project.
  • Insert your website domain (eg www.oldra.it) in the following files : index.html, about.html, generate_static_site.py, viewer.js, social.js
  • Customize the social sharing messages in index.html and social.js

The final result is www.oldra.it

1 commento:

  1. Hi!

    I have the same idea using the images from the Helsinki City Museum. I've followed your "tutorial" (many thanks for that!) and got everything working more or less, but one thing is not working for me:

    The images from the museum have lat/lon coordinates already and I've added them to my CSV-file and I guess I could bypass the generate-geocodes bit with some CSV to JSON parsing, but when I look at the code for generating geocodes it should still generate the lat-lons-ny.js file with supplied coordinates, but nothing happens. Shouldn't the "FakeResponse" in geocoder.py do the trick or am I missing something? (Disclaimer: I'm not a coder, so I coud be way off!).

    If you have any hints or code that could help me generate the lat-lons.ny.js from the CSV I would be very grateful!

    Thanks again!

    RispondiElimina