简体   繁体   中英

How should I read this link with Python? Pandas or BS4?

I am trying to read the following link with Pandas:

http://api.eia.gov/series/?api_key=3d82a096b5e846caa05ddc8e747a7fd&series_id=PET.WGIRIUS2.W

I've tried use pd.read_json() , which returned an error, ValueError: Mixing dicts with non-Series may lead to ambiguous ordering.

I tried using pd.read_csv which returns a DataFrame without any rows, and all of the columns are in a list.

This is the first part of my code:

import pandas as pd

eia_key='<private>'

def link_category(id_number):
    return 'http://api.eia.gov/category/?api_key='+eia_key+'&category_id='+id_number

def link_series(id_number):
    return 'http://api.eia.gov/series/?api_key='+eia_key+'&series_id='+id_number


'''U.S. Gross Inputs into Refineries, Weekly'''

page=link_series('PET.WGIRIUS2.W')

Then I try:

df=pd.read_csv(page)

and I get a mess with all of the table values as column names... but if I try

df=pd.read_json(page)

and I get the error mentioned above...

Any suggestions on the best way to read these EIA datasets with Python? I am open to using another library, like BS4 if that would be better.

Thank you in advance!!!!

I believe you just want the data field out of the response.

import json, requests
d = json.loads(requests.get(page).text)
df = pd.DataFrame(d['series'][0]['data'])

df will get you the data you want I believe.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM