简体   繁体   English

python 读取 json 字符串,初始部分带有标题

[英]python reading json string with headers in initial part

I am trying to grab output from a package (defined in the package documentation as 'jsonDICT') and eventually write it as csv. I am trying to grab output from a package (defined in the package documentation as 'jsonDICT') and eventually write it as csv. I will call this PackResult, and it is a dictionary.我将调用此 PackResult,它是一个字典。

The first, and last, few characters of print(PackResult) looks like this: print(PackResult) 的第一个也是最后几个字符如下所示:

{'startDate': '2019-11-01T00:00:00', 'endDate': '2020-03-31T00:00:00', 'timezone': 'UTC', 'groupBy': 'DAILY', 'numberOfDocuments': 34486, 'volume':  
[{'startDate': '2019-11-01T00:00:00', 'endDate': '2019-11-02T00:00:00', 'numberOfDocuments': 0},  
 {'startDate': '2019-11-02T00:00:00', 'endDate': '2019-11-03T00:00:00', 'numberOfDocuments': 1},  
 {'startDate': '2019-11-03T00:00:00', 'endDate': '2019-11-04T00:00:00', 'numberOfDocuments': 0}  
...  
{'startDate': '2020-03-30T00:00:00', 'endDate': '2020-03-31T00:00:00', 'numberOfDocuments': 1389}], 'status': 'success'}  

So the first part of the string contains "sample" column headers, and then once the left bracket is encountered, the actual values, with their respective column headers, are presented.所以字符串的第一部分包含“样本”列标题,然后一旦遇到左括号,就会显示实际值及其各自的列标题。

I am trying to use pandas to do the heavy lifting, but I cannot seem to get it to read the first set of headers, and then the data.我正在尝试使用 pandas 来完成繁重的工作,但我似乎无法让它读取第一组标题,然后读取数据。 Essentially,本质上,

import pandas as pand
import json
df = pand.read_json(json.dumps(PackResult),'records','series')
print(df)

gives me this:给了我这个:

startDate                                          2019-11-01T00:00:00  
endDate                                            2020-03-31T00:00:00  
timezone                                                           UTC  
groupBy                                                          DAILY  
numberOfDocuments                                                34486  
volume               [{'startDate': '2019-11-01T00:00:00', 'endDate...  
status                                                         success

and

df = pand.read_json(json.dumps(PackResult),'records','frame') 

gives me:给我:

startDate              endDate timezone groupBy  numberOfDocuments                                             volume   status  
0    2019-11-01T00:00:00  2020-03-31T00:00:00      UTC   DAILY              34486  {'startDate': '2019-11-01T00:00:00', 'endDate'...  success  
1    2019-11-01T00:00:00  2020-03-31T00:00:00      UTC   DAILY              34486  {'startDate': '2019-11-02T00:00:00', 'endDate'...  success  
2    2019-11-01T00:00:00  2020-03-31T00:00:00      UTC   DAILY              34486  {'startDate': '2019-11-03T00:00:00', 'endDate'...  success  
3    2019-11-01T00:00:00  2020-03-31T00:00:00      UTC   DAILY              34486  {'startDate': '2019-11-04T00:00:00', 'endDate'...  success  
4    2019-11-01T00:00:00  2020-03-31T00:00:00      UTC   DAILY              34486  {'startDate': '2019-11-05T00:00:00', 'endDate'...  success  

What am I missing?我错过了什么?

Thanks in advance提前致谢

Ah.啊。 A new day and some rest gives me the obvious thing I was missing:新的一天和一些 rest 给了我明显的我遗漏的东西:

df = pand.read_json(json.dumps(PackResult["volume"]),'records','frame')  

This results in这导致

#    startDate            endDate                 numberOfDocuments  
0    2019-11-01T00:00:00  2019-11-02T00:00:00                  0  
1    2019-11-02T00:00:00  2019-11-03T00:00:00                  1  
2    2019-11-03T00:00:00  2019-11-04T00:00:00                  0  
3    2019-11-04T00:00:00  2019-11-05T00:00:00                  0  
4    2019-11-05T00:00:00  2019-11-06T00:00:00                  0  

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM