简体   繁体   English

将非结构化 json 解析为 csv

[英]Parsing unstructured json into csv

I have yearly application data for different apps in json format.我有 json 格式的不同应用程序的年度应用程序数据。 There are 10 different json files for each application.每个应用程序有 10 个不同的 json 文件。 I try to merge them into a single csv.我尝试将它们合并为一个 csv。 Let me first show you the data structure:先给大家看一下数据结构:

[{"date": "2017-10-23", "downloads": 15358985, "end": "2017-10-23", "data": {"2.7.3.4196-beta": 7, "1.0.1": 268, "1.0.2": 715, "2.9.0.4250-beta": 1, "2.7.3.4215-beta": 2, "2.7.2.4151-beta": 1, "2.2.3.1-signed": 9292}}, {"date": "2017-10-22", "downloads": 12778233, "end": "2017-10-22", "data": {"2.7.3.4196-beta": 5,  "2.4.1": 842, "2.99.0.1872beta": 12, "2.99.0.1857beta": 4, "2.3.1.1-signed": 3, "2.6.10": 11538,  "2.6.4.1-signed": 8, "2.7.3.4198-beta": 4}}]

When I parse them into pandas dataframe I get something like this:当我将它们解析为 Pandas 数据框时,我得到如下信息:

date         downloads  end         data

2017-10-23   15358985   2017-10-23  {"2.7.3.4196-beta": 7, "1.0.1": 268, "1.0.2": 715, "2.9.0.4250-beta": 1, "2.7.3.4215-beta": 2, "2.7.2.4151-beta": 1, "2.2.3.1-signed": 9292}}
2017-10-22   12778233   2017-10-22  {"2.7.3.4196-beta": 5,  "2.4.1": 842, "2.99.0.1872beta": 12, "2.99.0.1857beta": 4, "2.3.1.1-signed": 3, "2.6.10": 11538,  "2.6.4.1-signed": 8, "2.7.3.4198-beta": 4}}

Please notice that not all of the versions are downloaded everyday.请注意,并非每天都会下载所有版本。 How I could create a column for different versions of the application?我如何为不同版本的应用程序创建一个列? If the application is not downloaded on particular day we could leave it blank or fill with NaNs如果应用程序未在特定日期下载,我们可以将其留空或填写 NaN

I think you need DataFrame constructor with reindex for add missing rows:我认为您需要带有reindex DataFrame构造函数来添加丢失的行:

j = [{"date": "2017-10-25", "downloads": 15358985, "end": "2017-10-23", "data": {"2.7.3.4196-beta": 7, "1.0.1": 268, "1.0.2": 715, "2.9.0.4250-beta": 1, "2.7.3.4215-beta": 2, "2.7.2.4151-beta": 1, "2.2.3.1-signed": 9292}}, {"date": "2017-10-22", "downloads": 12778233, "end": "2017-10-22", "data": {"2.7.3.4196-beta": 5,  "2.4.1": 842, "2.99.0.1872beta": 12, "2.99.0.1857beta": 4, "2.3.1.1-signed": 3, "2.6.10": 11538,  "2.6.4.1-signed": 8, "2.7.3.4198-beta": 4}}]

df = pd.DataFrame(j).set_index('date')
df.index = pd.to_datetime(df.index)

df = df.reindex(pd.date_range(start=df.index.min(), end=df.index.max()))
print (df)
                                                         data   downloads  \
2017-10-22  {'2.6.4.1-signed': 8, '2.99.0.1857beta': 4, '2...  12778233.0   
2017-10-23                                                NaN         NaN   
2017-10-24                                                NaN         NaN   
2017-10-25  {'2.7.2.4151-beta': 1, '1.0.1': 268, '2.9.0.42...  15358985.0   

                   end  
2017-10-22  2017-10-22  
2017-10-23         NaN  
2017-10-24         NaN  
2017-10-25  2017-10-23  

Solution with json_normalize , but if different formats of json s get a lot of NaN s values:使用json_normalize解决方案,但如果不同格式的json s 得到很多NaN s 值:

df = json_normalize(j).set_index('date')
df.index = pd.to_datetime(df.index)
#
df = df.reindex(pd.date_range(start=df.index.min(), end=df.index.max()))
print (df)
            data.1.0.1  data.1.0.2  data.2.2.3.1-signed  data.2.3.1.1-signed  \
2017-10-22         NaN         NaN                  NaN                  3.0   
2017-10-23         NaN         NaN                  NaN                  NaN   
2017-10-24         NaN         NaN                  NaN                  NaN   
2017-10-25       268.0       715.0               9292.0                  NaN   

            data.2.4.1  data.2.6.10  data.2.6.4.1-signed  \
2017-10-22       842.0      11538.0                  8.0   
2017-10-23         NaN          NaN                  NaN   
2017-10-24         NaN          NaN                  NaN   
2017-10-25         NaN          NaN                  NaN   

            data.2.7.2.4151-beta  data.2.7.3.4196-beta  data.2.7.3.4198-beta  \
2017-10-22                   NaN                   5.0                   4.0   
2017-10-23                   NaN                   NaN                   NaN   
2017-10-24                   NaN                   NaN                   NaN   
2017-10-25                   1.0                   7.0                   NaN   

            data.2.7.3.4215-beta  data.2.9.0.4250-beta  data.2.99.0.1857beta  \
2017-10-22                   NaN                   NaN                   4.0   
2017-10-23                   NaN                   NaN                   NaN   
2017-10-24                   NaN                   NaN                   NaN   
2017-10-25                   2.0                   1.0                   NaN   

            data.2.99.0.1872beta   downloads         end  
2017-10-22                  12.0  12778233.0  2017-10-22  
2017-10-23                   NaN         NaN         NaN  
2017-10-24                   NaN         NaN         NaN  
2017-10-25                   NaN  15358985.0  2017-10-23  

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM