简体   繁体   English

在python中从json.loads()解析数据

[英]Parsing data from json.loads(), in python

I'm attempting to analyze data from a website. 我正在尝试分析网站中的数据。 I parsed the HTML to get the json data using json.loads(). 我使用json.loads()解析HTML以获取json数据。

data = json.loads(soup.find('script', type='application/ld+json').text)

So now I'm left with data which resembles the following: 因此,现在剩下的数据类似于以下内容:

data = """
{'aggregateRating': {'reviewCount': 1691, 
                     '@type'      : 'AggregateRating', 
                     'ratingValue': 4.0}, 
 'review': [{'reviewRating' : {'ratingValue': 5}, 
               'datePublished': '2017-10-31', 
               'description'  : "I had a chance to see the Lakers ...", 
               'author'       : 'Andre W.'}]
}
""""

I am interested in returning the 'ratingValue' integer from reviewRating in the 'review' array. 我有兴趣从“评论”数组中的“ reviewRating”返回“ ratingValue”整数。 When I run this script: 当我运行此脚本时:

pd.DataFrame(data['review'], columns = ['reviewRating'])

I get this: 我得到这个:

    reviewRating
0   {'ratingValue': 5}

Instead, I'm looking to get data in the form of: 相反,我希望以以下形式获取数据:

    ratingValue
0   5

I've attempted various variations such as 我尝试了各种变化,例如

pd.DataFrame(data['review'], columns = ['reviewRating']['ratingValue'])
pd.DataFrame(data['review'], columns = ['reviewRating'][['ratingValue']])
pd.DataFrame(data['review']['reviewRating'], columns = ['ratingValue'])

But I'm sure i don't understand the underlaying structure of the data, or pandas. 但我确定我不理解数据或熊猫的底层结构。

Thus, am I better off cleaning {'ratingValue': 5} as a string in order to be left with the integer of interest, or is there an easy way to create a DataFrame that has the integer value of 'ratingValue'? 因此,我是否最好将{'ratingValue':5}清洗为字符串以便保留感兴趣的整数,还是有一种简单的方法来创建具有'ratingValue'整数值的DataFrame?

Thanks. 谢谢。

If you use json_normalize from pandas.io.json you can create the dataframe directly from the json. 如果从pandas.io.json使用json_normalizepandas.io.json可以直接从json创建数据pandas.io.json

Using your sample data, I was able to output: 使用您的样本数据,我能够输出:

>>> frame = json_normalize(data)

     author datePublished                           description  \
0  Andre W.    2017-10-31  I had a chance to see the Lakers ...

   reviewRating.ratingValue
0                         5

And then you can access the rating value using: 然后,您可以使用以下方法访问评分值:

frame.at[0, 'reviewRating.ratingValue'] # which should give you 5

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM