繁体   English   中英

将字典转换为dataframe python

[英]Convert dictionary to dataframe python

如何在pandas python中将文件作为数据框读取?

该文件包含以下内容

{"headers": {"ai5": "8fa683e59c02c04cb781ac689686db07", "debug": null, "random": null, "sdkv": "7.6"}, "post": {"event": "ggstart", "ts": "1462759195259"}, "params": {}, "bottle": {"timestamp": "2016-05-09 02:00:00.004906", "game_id": "55107008"}}
{"headers": {"ai5": "335644267c1d5f04eaea7bc6f51b1861", "debug": null, "random": null, "sdkv": "7.6"}, "post": {"event": "ggstart", "ts": "1462759189745"}, "params": {}, "bottle": {"timestamp": "2016-05-09 02:00:00.033775", "game_id": "55107008"}}

....下面有很多行

如何将其加载到数据框中,将字典键作为标题加载?

你可以先使用read_json参数lines=True

df = pd.read_json('file.json', lines=True)
print (df)
                                              bottle  \
0  {'timestamp': '2016-05-09 02:00:00.004906', 'g...   
1  {'timestamp': '2016-05-09 02:00:00.033775', 'g...   

                                             headers params  \
0  {'ai5': '8fa683e59c02c04cb781ac689686db07', 'r...     {}   
1  {'ai5': '335644267c1d5f04eaea7bc6f51b1861', 'r...     {}   

                                          post  
0  {'event': 'ggstart', 'ts': '1462759195259'}  
1  {'event': 'ggstart', 'ts': '1462759189745'}

然后concat嵌套dictionaries ,输出是列中的MultiIndex

df = pd.concat([pd.DataFrame(df[x].values.tolist()) for x in df], axis=1, keys=df.columns)
print (df)
     bottle                                                       headers  \
    game_id                   timestamp                               ai5   
0  55107008  2016-05-09 02:00:00.004906  8fa683e59c02c04cb781ac689686db07   
1  55107008  2016-05-09 02:00:00.033775  335644267c1d5f04eaea7bc6f51b1861   

                        post                 
  debug random sdkv    event             ts  
0  None   None  7.6  ggstart  1462759195259  
1  None   None  7.6  ggstart  1462759189745  

应用更缓慢的解决方案apply(pd.Series)

df = pd.concat([df[x].apply(pd.Series) for x in df], axis=1, keys=df.columns)
print (df)
     bottle                                                       headers  \
    game_id                   timestamp                               ai5   
0  55107008  2016-05-09 02:00:00.004906  8fa683e59c02c04cb781ac689686db07   
1  55107008  2016-05-09 02:00:00.033775  335644267c1d5f04eaea7bc6f51b1861   

                        post                 
  debug random sdkv    event             ts  
0  None   None  7.6  ggstart  1462759195259  
1  None   None  7.6  ggstart  1462759189745  

要删除MultiIndex添加map

df = pd.concat([pd.DataFrame(df[x].values.tolist()) for x in df], axis=1, keys=df.columns)
df.columns = df.columns.map('_'.join)
print (df)
  bottle_game_id            bottle_timestamp  \
0       55107008  2016-05-09 02:00:00.004906   
1       55107008  2016-05-09 02:00:00.033775   

                        headers_ai5 headers_debug headers_random headers_sdkv  \
0  8fa683e59c02c04cb781ac689686db07          None           None          7.6   
1  335644267c1d5f04eaea7bc6f51b1861          None           None          7.6   

  post_event        post_ts  
0    ggstart  1462759195259  
1    ggstart  1462759189745  

您可以使用python的open + readlines创建pd.Series对象,然后使用json.loadsjson_normalize的组合

import json
import pandas as pd

pd.io.json.json_normalize(
    pd.Series(open('file.json').readlines()).apply(json.loads))

在此输入图像描述

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM