Pandas 按 json 格式分組

Question

我有一個 pandas dataframe，列 ['region', 'country'... etc]。 我想從 df 得到低於 JSON 格式。

[{
      "total_count": 400,
      "geo": "Total",
      drill_through:[{
        "total_count": 180,
        "geo": "EURO",
        drill_through: [{
          "total_count": 100,
          "geo": "UK",
          drill_through: []
        }, {
          "total_count": 80,
          "geo": "Ireland",
          drill_through: []
        }]
      }, {
        "total_count": 130,
        "geo": "AMR",
        drill_through: [{
          "total_count": 20,
          "geo": "Mexico",
          drill_through: []
        }, {
          "total_count": 110,
          "geo": "California",
          drill_through: []
        }]
      }, {
        "total_count": 90,
        "geo": "APAC",
        drill_through: [{
          "total_count": 90,
          "geo": "Japan",
          drill_through: []
        }]
      }
    ]
 }]

我低於 DF 的帖子組：

df.groupby(['region', 'country']).size().reset_index(name='count')

上述組的 Output 由：

region      country         count
EURO        UK              100
            Ireland         80
AMR         Mexico          20
            California      110
APAC        Japan           90

如何實現上面的 JSON 格式？ 先感謝您。

Answer 1

試圖在沒有您起始 DataFrame 結構的完整上下文的情況下回答這個問題。 我創建了一個簡單的 dataframe 作為起點，並展示了如何獲得類似於您正在尋找的嵌套結構的東西。

請注意，我在 Country 級別省略了“drill_through”元素，您將其顯示為一個空數組，因為我不確定您將作為 Country 的子項包含哪些內容。 但是，如果你真的想要的話，在創建它們時在每個國家/地區元素上添加一個空數組應該是微不足道的。

import pandas as pd
import json

df = pd.DataFrame(
    data=[
        ("EURO","UK",100),
        ("EURO","Ireland",80),
        ("AMR","Mexico",20),
        ("AMR","California",110),
        ("APAC","Japan",90)
    ], 
    columns=["region","country","total_count"]
)

#First get the regions into a list of dictionary objects
regions = df[["region", "total_count"]].groupby("region").sum()
regions["geo"] = regions.index.values
regions = regions.to_dict(orient="records") 

#now add the countries to each region dictionary
for region in regions:
    countries = df[df["region"] == region["geo"]].drop("region", axis=1).groupby("country").sum()
    countries["geo"] = countries.index.values
    region["drill_through"] = countries.to_dict(orient="records")
    
#Serialize the list of regions as JSON
json_str = json.dumps(regions)

print(json_str)

Output：

[
    {
        "total_count": 130,
        "geo": "AMR",
        "drill_through": [
            {
                "total_count": 110,
                "geo": "California"
            },
            {
                "total_count": 20,
                "geo": "Mexico"
            }
        ]
    },
    {
        "total_count": 90,
        "geo": "APAC",
        "drill_through": [
            {
                "total_count": 90,
                "geo": "Japan"
            }
        ]
    },
    {
        "total_count": 180,
        "geo": "EURO",
        "drill_through": [
            {
                "total_count": 80,
                "geo": "Ireland"
            },
            {
                "total_count": 100,
                "geo": "UK"
            }
        ]
    }
]

Pandas 按 json 格式分組

問題描述

1 個解決方案

解決方案1
0 2021-12-27 02:19:26

Pandas 按 json 格式分組

問題描述

1 個解決方案

解決方案1 0 2021-12-27 02:19:26

解決方案1
0 2021-12-27 02:19:26