[英]python reading json string with headers in initial part
I am trying to grab output from a package (defined in the package documentation as 'jsonDICT') and eventually write it as csv. I am trying to grab output from a package (defined in the package documentation as 'jsonDICT') and eventually write it as csv. I will call this PackResult, and it is a dictionary.
我将调用此 PackResult,它是一个字典。
The first, and last, few characters of print(PackResult) looks like this: print(PackResult) 的第一个也是最后几个字符如下所示:
{'startDate': '2019-11-01T00:00:00', 'endDate': '2020-03-31T00:00:00', 'timezone': 'UTC', 'groupBy': 'DAILY', 'numberOfDocuments': 34486, 'volume':
[{'startDate': '2019-11-01T00:00:00', 'endDate': '2019-11-02T00:00:00', 'numberOfDocuments': 0},
{'startDate': '2019-11-02T00:00:00', 'endDate': '2019-11-03T00:00:00', 'numberOfDocuments': 1},
{'startDate': '2019-11-03T00:00:00', 'endDate': '2019-11-04T00:00:00', 'numberOfDocuments': 0}
...
{'startDate': '2020-03-30T00:00:00', 'endDate': '2020-03-31T00:00:00', 'numberOfDocuments': 1389}], 'status': 'success'}
So the first part of the string contains "sample" column headers, and then once the left bracket is encountered, the actual values, with their respective column headers, are presented.所以字符串的第一部分包含“样本”列标题,然后一旦遇到左括号,就会显示实际值及其各自的列标题。
I am trying to use pandas to do the heavy lifting, but I cannot seem to get it to read the first set of headers, and then the data.我正在尝试使用 pandas 来完成繁重的工作,但我似乎无法让它读取第一组标题,然后读取数据。 Essentially,
本质上,
import pandas as pand
import json
df = pand.read_json(json.dumps(PackResult),'records','series')
print(df)
gives me this:给了我这个:
startDate 2019-11-01T00:00:00
endDate 2020-03-31T00:00:00
timezone UTC
groupBy DAILY
numberOfDocuments 34486
volume [{'startDate': '2019-11-01T00:00:00', 'endDate...
status success
and和
df = pand.read_json(json.dumps(PackResult),'records','frame')
gives me:给我:
startDate endDate timezone groupBy numberOfDocuments volume status
0 2019-11-01T00:00:00 2020-03-31T00:00:00 UTC DAILY 34486 {'startDate': '2019-11-01T00:00:00', 'endDate'... success
1 2019-11-01T00:00:00 2020-03-31T00:00:00 UTC DAILY 34486 {'startDate': '2019-11-02T00:00:00', 'endDate'... success
2 2019-11-01T00:00:00 2020-03-31T00:00:00 UTC DAILY 34486 {'startDate': '2019-11-03T00:00:00', 'endDate'... success
3 2019-11-01T00:00:00 2020-03-31T00:00:00 UTC DAILY 34486 {'startDate': '2019-11-04T00:00:00', 'endDate'... success
4 2019-11-01T00:00:00 2020-03-31T00:00:00 UTC DAILY 34486 {'startDate': '2019-11-05T00:00:00', 'endDate'... success
What am I missing?我错过了什么?
Thanks in advance提前致谢
Ah.啊。 A new day and some rest gives me the obvious thing I was missing:
新的一天和一些 rest 给了我明显的我遗漏的东西:
df = pand.read_json(json.dumps(PackResult["volume"]),'records','frame')
This results in这导致
# startDate endDate numberOfDocuments
0 2019-11-01T00:00:00 2019-11-02T00:00:00 0
1 2019-11-02T00:00:00 2019-11-03T00:00:00 1
2 2019-11-03T00:00:00 2019-11-04T00:00:00 0
3 2019-11-04T00:00:00 2019-11-05T00:00:00 0
4 2019-11-05T00:00:00 2019-11-06T00:00:00 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.