简体   繁体   中英

Using chunk in json, requests to get large data into python

I am trying to get a large data into python using API. But I am not being able to get the entire data. The request is allowing only first 1000 lines to be retrieved.

r = requests.get("https://data.cityofchicago.org/resource/6zsd-86xi.json")

json=r.json()
df=pd.DataFrame(json)
df.drop(df.columns[[0,1,2,3,4,5,6,7]], axis=1, inplace=True) #dropping some columns
df.shape

Output is

(1000,22)

The website contains almost 6 million data points. Yet only 1000 are retrieved. How do I get around this? Is chunking right option? Can someone please help me with the code?

Thanks.

You'll need to paginate through the results to get the entire dataset. Most APIs will limit the amount of results returned in a single request. According to the Socrata docs you need to add $limit and $offset parameters to the request url.

For example, for the first page of results you would start with - https://data.cityofchicago.org/resource/6zsd-86xi.json?$limit=1000&$offset=0

Then for the next page you would just increment the offset - https://data.cityofchicago.org/resource/6zsd-86xi.json?$limit=1000&$offset=1000

Continue incrementing until you have the entire dataset.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM