[英]Parsing data from json.loads(), in python
I'm attempting to analyze data from a website. 我正在尝试分析网站中的数据。 I parsed the HTML to get the json data using json.loads().
我使用json.loads()解析HTML以获取json数据。
data = json.loads(soup.find('script', type='application/ld+json').text)
So now I'm left with data which resembles the following: 因此,现在剩下的数据类似于以下内容:
data = """
{'aggregateRating': {'reviewCount': 1691,
'@type' : 'AggregateRating',
'ratingValue': 4.0},
'review': [{'reviewRating' : {'ratingValue': 5},
'datePublished': '2017-10-31',
'description' : "I had a chance to see the Lakers ...",
'author' : 'Andre W.'}]
}
""""
I am interested in returning the 'ratingValue' integer from reviewRating in the 'review' array. 我有兴趣从“评论”数组中的“ reviewRating”返回“ ratingValue”整数。 When I run this script:
当我运行此脚本时:
pd.DataFrame(data['review'], columns = ['reviewRating'])
I get this: 我得到这个:
reviewRating
0 {'ratingValue': 5}
Instead, I'm looking to get data in the form of: 相反,我希望以以下形式获取数据:
ratingValue
0 5
I've attempted various variations such as 我尝试了各种变化,例如
pd.DataFrame(data['review'], columns = ['reviewRating']['ratingValue'])
pd.DataFrame(data['review'], columns = ['reviewRating'][['ratingValue']])
pd.DataFrame(data['review']['reviewRating'], columns = ['ratingValue'])
But I'm sure i don't understand the underlaying structure of the data, or pandas. 但我确定我不理解数据或熊猫的底层结构。
Thus, am I better off cleaning {'ratingValue': 5} as a string in order to be left with the integer of interest, or is there an easy way to create a DataFrame that has the integer value of 'ratingValue'? 因此,我是否最好将{'ratingValue':5}清洗为字符串以便保留感兴趣的整数,还是有一种简单的方法来创建具有'ratingValue'整数值的DataFrame?
Thanks. 谢谢。
If you use json_normalize
from pandas.io.json
you can create the dataframe directly from the json. 如果从
pandas.io.json
使用json_normalize
, pandas.io.json
可以直接从json创建数据pandas.io.json
。
Using your sample data, I was able to output: 使用您的样本数据,我能够输出:
>>> frame = json_normalize(data)
author datePublished description \
0 Andre W. 2017-10-31 I had a chance to see the Lakers ...
reviewRating.ratingValue
0 5
And then you can access the rating value using: 然后,您可以使用以下方法访问评分值:
frame.at[0, 'reviewRating.ratingValue'] # which should give you 5
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.