简体   繁体   English

如何从 pandas df 生成带有嵌套字典的 json 文件?

[英]How to generate a json file with a nested dictionary from pandas df?

I need to generate a json file with a specific format from a pandas dataframe.我需要从 pandas dataframe 生成具有特定格式的 json 文件。 The dataframe looks like this: dataframe 看起来像这样:

user_id用户身份 product_id product_id date日期
1 1 23 23 01-01-2022 01-01-2022
1 1 24 24 05-01-2022 05-01-2022
2 2 56 56 05-06-2022 05-06-2022
3 3 23 23 02-07-2022 02-07-2022
3 3 24 24 01-02-2022 01-02-2022
3 3 56 56 02-01-2022 02-01-2022

And the json file needs to have the following format:并且 json 文件需要具有以下格式:

{
  "user_id": 1,
  "items": [{
        "product_id": 23,
        "date": 01-01-2022
        }, {
        "product_id": 24,
        "date": 05-01-2022
        }]
}
{
 "userid": 2,
 "items": [{
        "product_id": 56,
        "date": 05-06-2022
        }]
}
...etc

I've tried the following, but it's not the right format:我尝试了以下方法,但它不是正确的格式:

result = (now.groupby('user_id')['product_id','date'].apply(lambda x: dict(x.values)).to_json())

Any help would be much appreciated任何帮助将非常感激

out = (df[['product_id','date']].apply(dict, axis=1)
       .groupby(df['user_id']).apply(list)
       .to_frame('items').reset_index()
       .to_dict('records'))
print(out)

[{'user_id': 1, 'items': [{'product_id': 23, 'date': '01-01-2022'}, {'product_id': 24, 'date': '05-01-2022'}]},
{'user_id': 2, 'items': [{'product_id': 56, 'date': '05-06-2022'}]}, 
{'user_id': 3, 'items': [{'product_id': 23, 'date': '02-07-2022'}, {'product_id': 24, 'date': '01-02-2022'}, {'product_id': 56, 'date': '02-01-2022'}]}]

The below code can solve the issue.下面的代码可以解决这个问题。 It first converts the datetime to string for the date column.它首先将日期时间转换为日期列的字符串。 Then, it converts the dataframe into the desired format.然后,它将 dataframe 转换为所需的格式。

data is your data table saved as the excel file. data是您保存为 excel 文件的数据表。

# Import libraries
import pandas as pd
import openpyxl
import json

# Read the excel data
data = pd.read_excel("data.xlsx", sheet_name=0)

# Change the data type of the date column (day-month-year)
data['date'] = data['date'].apply(lambda x: x.strftime('%d-%m-%Y'))

# Convert to desired json format
json_data = (data.groupby(['user_id'])
               .apply(lambda x: x[['product_id','date']].to_dict('records'))
               .reset_index()
               .rename(columns={0:'items'})
               .to_json(orient='records'))

# Pretty print the result
# https://stackoverflow.com/a/12944035/10905535
json_data = json.loads(json_data)
print(json.dumps(json_data, indent=4, sort_keys=False))

The output: output:

[
    {
        "user_id": 1,
        "items": [
            {
                "product_id": 23,
                "date": "01-01-2022"
            },
            {
                "product_id": 24,
                "date": "05-01-2022"
            }
        ]
    },
    {
        "user_id": 2,
        "items": [
            {
                "product_id": 56,
                "date": "05-06-2022"
            }
        ]
    },
    {
        "user_id": 3,
        "items": [
            {
                "product_id": 23,
                "date": "02-07-2022"
            },
            {
                "product_id": 24,
                "date": "01-02-2022"
            },
            {
                "product_id": 56,
                "date": "02-01-2022"
            }
        ]
    }
]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM