[英]JSON file to Pandas df
I'm trying to convert a JSON file into a pandas df to remove unwanted data and limit to a csv of ID's the data looks like this: 我正在尝试将JSON文件转换为pandas df以删除不需要的数据并限制为ID的csv,数据如下所示:
{
"data": [
{
"message": "Uneeded message",
"created_time": "2017-04-02T17:20:37+0000",
"id": "723456782912449_1008262099345654"
},
{
"message": "Uneeded message",
"created_time": "2017-03-28T06:26:28+0000",
"id": "771345678912449_1003934567871010"
},
I've not used JSON before but the code i've used to load this data is 我之前没有使用过JSON,但我用来加载这些数据的代码是
import pandas as pd
import json
with open('fileName.json', encoding="utf8" ) as f:
w = json.loads(f.read(), strict=False)
The end output should just be a CSV with a column of ID's 结束输出应该只是一个带有ID列的CSV
I think you need json_normalize
: 我认为你需要
json_normalize
:
from pandas.io.json import json_normalize
import json
with open('file.json') as data_file:
d = json.load(data_file)
print (d)
{
"data": [{
"message": "Uneeded message",
"created_time": "2017-04-02T17:20:37+0000",
"id": "723456782912449_1008262099345654"
}, {
"message": "Uneeded message",
"created_time": "2017-03-28T06:26:28+0000",
"id": "771345678912449_1003934567871010"
}]
}
df = json_normalize(d, 'data')
print (df)
created_time id message
0 2017-04-02T17:20:37+0000 723456782912449_1008262099345654 Uneeded message
1 2017-03-28T06:26:28+0000 771345678912449_1003934567871010 Uneeded message
using json.loads
使用
json.loads
setup 设定
json_str = """{
"data": [
{
"message": "Uneeded message",
"created_time": "2017-04-02T17:20:37+0000",
"id": "723456782912449_1008262099345654"
},
{
"message": "Uneeded message",
"created_time": "2017-03-28T06:26:28+0000",
"id": "771345678912449_1003934567871010"
}]}"""
solution 解
import json
import pandas as pd
pd.DataFrame(json.loads(json_str)['data'])
created_time id message
0 2017-04-02T17:20:37+0000 723456782912449_1008262099345654 Uneeded message
1 2017-03-28T06:26:28+0000 771345678912449_1003934567871010 Uneeded message
Or with the json in the file 或者使用文件中的json
with open('neutraluk1.json') as f:
print(pd.DataFrame(json.load(f)['data']))
created_time id message
0 2017-04-02T17:20:37+0000 723456782912449_1008262099345654 Uneeded message
1 2017-03-28T06:26:28+0000 771345678912449_1003934567871010 Uneeded message
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.