简体   繁体   English

导出 Pandas DF 到嵌套的 JSON(多嵌套)

[英]Export Pandas DF to nested JSON (multiple nesting)

I want to export a Pandas df to a nested JSON for ingestion in Mongodb.我想将 Pandas df 导出到嵌套的 JSON 以便在 Mongodb 中摄取。

Here's an example of the data:以下是数据示例:

data = {
    'product_id': ['a001','a001','a001'],
    'product': ['aluminium','aluminium','aluminium'],
    'production_id': ['b001','b002','b002'],
    'production_name': ['metallurgical','recycle','recycle'],
    'geo_name': ['US','EU','RoW'],
    'value': [100, 200 ,200]
}
df = pd.DataFrame(data=data)
product_id product_id product产品 production_id生产ID production_name生产名称 geo_name地理名称 value价值
a001 a001 aluminium b001 b001 metallurgical冶金 US我们 100 100
a001 a001 aluminium b002 b002 recycle回收 EU欧盟 200 200
a001 a001 aluminium b002 b002 recycle回收 RoW 200 200

and this is what the final JSON should look like:这就是最终 JSON 的样子:

{
    "name_id": "a001",
    "name": "aluminium",
    "activities": [
        {
            "product_id": "b001"
            "product_name": "metallurgical",
            "regions": [
                {
                    "geo_name": "US",
                    "value": 100
                }
            ]
        },
        {
            "product_id": "b002"
            "product_name": "recycle",
            "regions": [
                {
                    "geo_name": "EU",
                    "value": 200
                },
                {
                    "geo_name": "RoW",
                    "value": 200
                }
            ]
        }
    ]
}

There are some questions that are close to my problem but they are either years old, and refer to an older version of Pandas for which the solutions break, or do not fully work the way I would like the json to be grouped and nested (this for example is single level How to create a nested JSON from pandas DataFrame? ).有一些问题与我的问题很接近,但它们要么是老版本,要么是旧版本的 Pandas,解决方案会中断,或者不能完全按照我希望 json 分组和嵌套的方式工作(这例如,单级如何从 pandas DataFrame 创建嵌套的 JSON? )。

Some help would be really appreciated.一些帮助将不胜感激。

I found the easiest solution that can work for an infinite number of nesting (2 in this example):我找到了最简单的解决方案,可以用于无限数量的嵌套(本例中为 2 个):

json_extract = df\
    .groupby(['product_id','product', 'production_id','production_name'])\
    .apply(lambda x: x[['geo_name','value']].to_dict('records'))\
    .reset_index(name='geos')\
    .groupby(['product_id','product'])\
    .apply(lambda x: x[['production_id','production_name', 'geos']].to_dict('records'))\
    .reset_index(name='production')\
    .to_json(orient='records')
[
    {
        "product_id": "a001",
        "product": "aluminium",
        "production": [
            {
                "production_id": "b001",
                "production_name": "metallurgical",
                "geos": [
                    {
                        "geo_name": "US",
                        "value": 100
                    }
                ]
            },
            {
                "production_id": "b002",
                "production_name": "recycle",
                "geos": [
                    {
                        "geo_name": "EU",
                        "value": 200
                    },
                    {
                        "geo_name": "RoW",
                        "value": 200
                    }
                ]
            }
        ]
    }
]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM