简体   繁体   English

从熊猫数据框格式化json

[英]Formatting json from pandas dataframe

I'm trying to build out a JSON file from my dataframe that looks similar to this: 我正在尝试从数据框中构建一个类似于以下内容的JSON文件:

{'249' : [
          {'candidateId': 751,
           'votes':7528,
           'vote_pct':0.132
          },
          {'candidateId': 803,
           'votes':7771,
           'vote_pct':0.138
          }...
          ],
'274': [
         {'candidateId': 891,
         ....

My dataframe looks like this: 我的数据框如下所示:

         officeId  candidateId    votes  vote_pct
0        249          751         7528  0.132198
1        249          803         7771  0.136465
2        249          818         7569  0.132918
3        249          827         9089  0.159610
4        249          856         2271  0.039881
5        249          877         7491  0.131548
6        249          878         8758  0.153798 
7        249          895         6267  0.110054
8        249         1161          201  0.003530
9        274          736         4664  0.073833
10       274          737         6270  0.099256
11       274          757         4953  0.078407
12       274          769         5239  0.082935
13       274          770         7134  0.112933
14       274          783         7673  0.121466
15       274          862         6361  0.100697
16       274          901         7671  0.121434

Using a function I can flip the dataframe's index and return it as a JSON string for each office ID, like this: 使用函数,我可以翻转数据框的索引,并将其作为每个办公室ID的JSON字符串返回,如下所示:

def clean_results(votes):
    #trying to get a well structured json file
    return votes.reset_index().to_json(orient='index', double_precision=2)

res_json = results.groupby(['officeId']).apply(clean_results)

But when I do that I end up with a new dataframe, with a JSON object for each officeID, and the JSON uses the numbered index as the top level, like so: 但是,当我这样做时,我最终得到一个新的数据帧,每个officeID都有一个JSON对象,并且JSON使用数字索引作为顶层,如下所示:

{"0":{"index":0.0,"officeId":249.0,"candidateId":751.0,"total_votes":7528.0,"vote_pct":0.13},"1":{"index":1.0,"officeId":249.0,"candidateId":803.0,"total_votes":7771.0,"vote_pct":0.14},"2":...

This is one approach, there may be something cleaner. 这是一种方法,可能会有更清洁的方法。

results = {}
for key, df_gb in df.groupby('officeId'):
    results[str(key)] = df_gb.to_dict('records')


import json
print json.dumps(results, indent=4)
####
{
    "274": [
        {
            "votes": 4664.0, 
            "candidateId": 736.0, 
            "vote_pct": 0.07383300000000001, 
            "officeId": 274.0
        }, 
        {
            "votes": 6270.0, 
            "candidateId": 737.0, 
            "vote_pct": 0.099255999999999997, 
            "officeId": 274.0
 ......

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM