Friday, December 28, 2012

Fetch and plot data from the Google Ngram Viewer using Python

Take a look at this Github repo for a Python script that can be used to fetch data from the Google Ngram Viewer. See the README file in the repo for instructions on how to use the script, as well as the PLOTTING file for instructions on how to plot the data in Python using pandas.

The Python script is a modified version of the Python script from the culturomics.org website.

To see an example of the type of research that's being done using the Google Ngram Viewer, check out this TED Talk by the creators of culturomics.org at Harvard.

NOTE: Be nice to the Google servers and don't beat them to death with this script.

Monday, December 17, 2012

Use Python to scrape data from Newegg and store it in an SQLite database

Check out this repo of mine on Github for a set of Python scripts for scraping various data from Newegg.com and storing it in an SQLite database.

The scripts use the mobile Newegg.com site to retrieve the list of all the products in a category, then uses the product ID for each product in that category to fetch and parse data from the Newegg JSON API before transforming it into a pandas DataFrame and dumping it into a table in the SQLite database.

The reason for using the mobile Newegg.com site is because it is lighter weight than the desktop version of the site.

As of right now, I have scripts in the Github repo setup to collect data on the following:

  • Desktop CPUs
  • Desktop Memory
  • Hard Drives
  • Laptops
  • LCD/LED/Plasma TVs
  • PS3 Games
  • XBox 360 Games

Each script dumps the data into a separate table in the SQLite database file. Feel free to tweak these scripts however you like, perhaps to retrieve different data from the Newegg JSON API for each product or even to change it to grab data from another set of products on Newegg. Enjoy!