简体   繁体   English

如何将嵌套的 json 字符串拆分为 3 列并将其与 dataframe 中的 user_id 列相关联?

[英]How do i split out nested json string into 3 columns and relate it to the user_id column in a dataframe?

I currently have a dataframe with 2 columns: user_id, items.我目前有一个 dataframe 有 2 列:user_id,项目。 Example data is:示例数据是:

user_id = 01e716c9bec1423e1

items = [{'item_id': '31499834785910', 'price': 3000.0, 'quantity': 2.0}, {'item_id': '31919169077366', 'price': 2500.0, 'quantity': 1.0}, {'item_id': '32130388426870', 'price': 5000.0, 'quantity': 1.0}, {'item_id': '22640717824118', 'price': 2000.0, 'quantity': 1.0}, {'item_id': '32044129157238', 'price': 3000.0, 'quantity': 1.0}, {'item_id': '31492182245494', 'price': 1500.0, 'quantity': 1.0}]

Items can contain more nested items,less or even none.项目可以包含更多的嵌套项目,更少甚至没有。 What i want as an end product is:我想要的最终产品是:

df['user_id','item_id','price','quantity'] with obviously a row per item. 

So far i have tried:到目前为止,我已经尝试过:

import pandas as pd
import ast
import numpy as np
import pyodbc
import json

mylist = list(df['items'])
mynewlist = []
for l in mylist:
    mynewlist.append(ast.literal_eval(l))
data_items = pd.DataFrame(mynewlist)
data_new = pd.concat([df,data_items],axis=1)
del data_new['items']

but this just messes the entire dataframe up and adds about 40 columns on NaN and still doesnt break up the json.但这只会弄乱整个 dataframe 并在 NaN 上添加大约 40 列,但仍然没有分解 json。

I have found a few answers on this but none of them seem to help me out at all.我已经找到了一些答案,但似乎都没有帮助我。 so any help would be greatly appreciated.所以任何帮助将不胜感激。 Also i have tried json_normalize and can't seem to figure it out.我也尝试过 json_normalize ,但似乎无法弄清楚。

I feel as thought is is a detailed question and apologies for not providing it in table format as i can't figure out how to do that, but if you need more info please let me know.我觉得这是一个详细的问题,很抱歉没有以表格格式提供它,因为我不知道该怎么做,但是如果您需要更多信息,请告诉我。

You can use a simple for loop to add the user_id key and value to each dictionary in the items list:您可以使用简单for循环将user_id键和值添加到items列表中的每个字典:

import pandas as pd

user_id = '01e716c9bec1423e1'

items = [{'item_id': '31499834785910', 'price': 3000.0, 'quantity': 2.0},
         {'item_id': '31919169077366', 'price': 2500.0, 'quantity': 1.0},
         {'item_id': '32130388426870', 'price': 5000.0, 'quantity': 1.0}, 
         {'item_id': '22640717824118', 'price': 2000.0, 'quantity': 1.0},
         {'item_id': '32044129157238', 'price': 3000.0, 'quantity': 1.0},
         {'item_id': '31492182245494', 'price': 1500.0, 'quantity': 1.0}]

# add the user_id to each dictionary
for item in items:
    item['user_id'] = user_id

df = pd.DataFrame(items)

print(df)

Output: Output:

          item_id   price  quantity            user_id
0  31499834785910  3000.0       2.0  01e716c9bec1423e1
1  31919169077366  2500.0       1.0  01e716c9bec1423e1
2  32130388426870  5000.0       1.0  01e716c9bec1423e1
3  22640717824118  2000.0       1.0  01e716c9bec1423e1
4  32044129157238  3000.0       1.0  01e716c9bec1423e1
5  31492182245494  1500.0       1.0  01e716c9bec1423e1

An alternative without using a loop is:不使用循环的替代方法是:

import pandas as pd

user_id = ['01e716c9bec1423e1']

items = [{'item_id': '31499834785910', 'price': 3000.0, 'quantity': 2.0},
     {'item_id': '31919169077366', 'price': 2500.0, 'quantity': 1.0},
     {'item_id': '32130388426870', 'price': 5000.0, 'quantity': 1.0}, 
     {'item_id': '22640717824118', 'price': 2000.0, 'quantity': 1.0},
     {'item_id': '32044129157238', 'price': 3000.0, 'quantity': 1.0},
     {'item_id': '31492182245494', 'price': 1500.0, 'quantity': 1.0}]

df = pd.DataFrame(items)

# since user_id is a list, you just multiply by len(df) to have a list with the compatible length
df['user_id'] = user_id * len(df)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将数据从 Pandas 数据帧的一列拆分为新数据帧的多列 - How do I split data out from one column of a pandas dataframe into multiple columns of a new dataframe 将嵌套的 json/字典从 Pandas dataframe 拆分为单独的列 - Split out nested json/dictionary from Pandas dataframe into separate columns 如何将 pandas dataframe 列拆分为 3 个唯一列? - How do I split a pandas dataframe column into 3 unique columns? 如何使用pandas Python将字符串拆分为数据框中的多个列? - How do I split a string into several columns in a dataframe with pandas Python? 如何将 dataframe 字符串列拆分为两列? - How to split a dataframe string column into two columns? 如何在Django中通过user_id获取用户名? - How do I get username by user_id in Django? 如何将“user_id”传递给 CustomerSerializer? - How do I pass 'user_id' to CustomerSerializer? 如何检查 1 个数据帧中的列中的整数值是否存在于第 2 个数据帧中 2 列之间的范围拆分中? - How do I check for an integer value in a column in 1 dataframe to exist in a range split between 2 columns in 2nd dataframe? 如何拆分多索引 dataframe 与一列充满不同键的字典 - How do I split out a multi-index dataframe with a column full of dictionaries with different keys 我如何解决“user_id”列中的错误 null 值违反非空约束? - How i can solve error null value in column “user_id” violates not-null constraint?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM