简体   繁体   中英

How to store data from Google Ngram API?

I need to store the data presented in the graphs on the Google Ngram website. For example, I want to store the occurences of "it's" as a percentage from 1800-2008, as presented in the following link: https://books.google.com/ngrams/graph?content=it%27s&year_start=1800&year_end=2008&corpus=0&smoothing=3&share=&direct_url=t1%3B%2Cit%27s%3B%2Cc0 .

The data I want is the data you're able to scroll over on the graph. How can I extract this for about 140 different terms (eg "it's", "they're", "she's", etc.)?

econpy wrote a nice little module in Python that you can use through a command-line interface.

For your "it's" example, you would need to type this command in a terminal / windows console:

python getngrams.py it's -startYear=1800 -endYear=2008 -corpus=eng_2009 -smoothing=3

This will automatically save the query result in a CSV file named after your query parameters.

econpy's package, in @HugoMailhot's answer, no longer works (2021) and seems not maintained. Here's a updated version, with some improvements for easier integration into Python code: https://gitlab.com/cpbl/google-ngrams

You can call this from the command line (as in econpy's) to create a CSV file, eg

getngrams.py it's -startYear=1800 -endYear=2008 -corpus=eng_2009 -smoothing=3

or call it from python to get (and plot) data directly in python, eg:

from getngrams import ngrams
df = ngrams('bells and whistles -startYear=1900 -endYear=2018 -smoothing=2')
df.plot()

The xkcd functionality is still there too.

(Issues / bug fix pull requests /etc welcome there)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM