简体   繁体   English

python pandas - 将列中的唯一 ID 从主 df 添加回存储在数据帧列表中的已处理 df

[英]python pandas - add unique Ids in column from master df back in to processed dfs stored in list of dataframes

I have a single df that includes multiple json strings per row that need reading and normalizing.我有一个 df,其中每行包含多个 json 字符串,需要读取和规范化。

I can read out the json info and normalize the columns by storing each row as a new dataframe in a list - which i have done with the code below.我可以读出 json 信息并通过将每一行存储为列表中的新 dataframe 来规范化列 - 我已经使用下面的代码完成了。

However I need to append the original unique Id in the original df (ie 'id': ['9clpa','g659am']) - which is lost in my current code.但是我需要 append 原始df中的原始唯一ID(即'id':['9clpa','g659am']) - 这在我当前的代码中丢失了。

The expected output is a list of dataframes per Id that include the exploded json info, with an additional column including Id (which will be repeated for each row of the final df).预期的 output 是每个 Id 的数据帧列表,其中包括分解的 json 信息,以及包含 Id 的附加列(将为最终 df 的每一行重复)。

I hope that makes sense, any suggestions are very welcome.我希望这是有道理的,非常欢迎任何建议。 thanks so much非常感谢

dataframe dataframe

df = pd.DataFrame(data={'id': ['9clpa','g659am'],'i2': [('{"t":"unique678","q":[{"qi":"01","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"G","value":"3"},{"answer":"V","value":"4"}]},{"qi":"02","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"A","value":"3"},{"answer":"B","value":"4"},{"answer":"G","value":"5"},{"answer":"NC","value":"6"},{"answer":"O","value":"7"} ]}]}'),('{"t":"unique428","q":[{"qi":"01","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"G","value":"3"},{"answer":"V","value":"4"}]},{"qi":"02","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"A","value":"3"},{"answer":"B","value":"4"},{"answer":"G","value":"5"},{"answer":"NC","value":"6"},{"answer":"O","value":"7"} ]}]}')]})

current code当前代码

out={}
for i in range(len(df)):
    out[i] = pd.read_json(df.i2[i])
    out[i] = pd.json_normalize(out[i].q)

expected output预计 output

pd.DataFrame(data={'id': ['9clpa','9clpa'],'qi': ['01','02'], 'answers': ['{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"G","value":"3"},{"answer":"V","value":"4"}', '"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"A","value":"3"},{"answer":"B","value":"4"},{"answer":"G","value":"5"},{"answer":"NC","value":"6"},{"answer":"O","value":"7"']})
pd.DataFrame(data={'id': ['g659am','g659am'],'qi': ['01','02'], 'answers': ['{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"G","value":"3"},{"answer":"V","value":"4"}', '"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"A","value":"3"},{"answer":"B","value":"4"},{"answer":"G","value":"5"},{"answer":"NC","value":"6"},{"answer":"O","value":"7"']})
df = pd.DataFrame(data={'id': ['9clpa','g659am'],'i2': [('{"t":"unique678","q":[{"qi":"01","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"G","value":"3"},{"answer":"V","value":"4"}]},{"qi":"02","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"A","value":"3"},{"answer":"B","value":"4"},{"answer":"G","value":"5"},{"answer":"NC","value":"6"},{"answer":"O","value":"7"} ]}]}'),('{"t":"unique428","q":[{"qi":"01","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"G","value":"3"},{"answer":"V","value":"4"}]},{"qi":"02","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"A","value":"3"},{"answer":"B","value":"4"},{"answer":"G","value":"5"},{"answer":"NC","value":"6"},{"answer":"O","value":"7"} ]}]}')]})
out={}
columns1 = ['id','qi','answers']
for i in range(len(df)):
    out[i] = pd.read_json(df.i2[i])
    out[i] = pd.json_normalize(out[i].q)
    df_new = pd.DataFrame(data=out[i],columns=columns1)
    df_new = df_new.assign(id = lambda x: df.id[i])
    display(df_new)

You can add a lambda function which will assign the value of 'id' to new df formed.您可以添加一个 lambda function 将“id”的值分配给新形成的 df。

Edit: You can add location of 'id' column, in columns1 and define where you want it to appear when you create a dataframe.编辑:您可以在 columns1 中添加“id”列的位置,并在创建 dataframe 时定义您希望它出现的位置。

Output dataframe: Output dataframe:

在此处输入图像描述

You are just missing on assigning the id to your dataframe after your normalize columns:在规范化列之后,您只是缺少将id分配给 dataframe :

out={}
for i in range(len(df)):
    out[i] = pd.read_json(df.i2[i])
    out[i] = pd.json_normalize(out[i].q)
    out[i]['id'] = df.id[i]
    out[i] = out[i].loc[:, ['id','qi','answers']]

Output: Output:

>>> out[0]
      id  qi                                                                                                                                                                                                                     answers
0  9clpa  01                                                                                                [{'answer': 'M', 'value': '1'}, {'answer': 'F', 'value': '2'}, {'answer': 'G', 'value': '3'}, {'answer': 'V', 'value': '4'}]
1  9clpa  02  [{'answer': 'M', 'value': '1'}, {'answer': 'F', 'value': '2'}, {'answer': 'A', 'value': '3'}, {'answer': 'B', 'value': '4'}, {'answer': 'G', 'value': '5'}, {'answer': 'NC', 'value': '6'}, {'answer': 'O', 'value': '7'}]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 有没有办法匹配来自两个数据帧的序列号并将 df2 中的系列(来自行)列表添加到 df1 的新列中(Python,pandas) - Is there a way to match serial numbers from two dataframes and add a list of Series (from rows) from df2 into a new column in df1 (Python, pandas) Python Pandas - 将 df 中的列转换为存储在另一个 df 中的 ID - Python Pandas - Converting columns in a df to their IDs stored in another df 在具有唯一行 ID 列的 Python Pandas DataFrames 中,如何在给定行 ID 的情况下查找列值? - In Python Pandas DataFrames with column of unique row IDs, how to find column value given row ID? 将具有唯一行 ID 列和列表列的 Python Pandas 数据帧合并/联合到 DataFrame 具有所有 ID 和统一的非重复列表? - Merge/union Python Pandas DataFrames with column of unique row IDs and column of lists to DataFrame with all IDs and united, non-repeating lists? Python、Pandas、df 2 部分问题:1. 如何根据特定条件将列添加到列表中 2. 如何从 df 中删除这些列 - Python, Pandas, df 2 part question: 1. how to add a column into a list based of a certain condition 2. how to remove those columns from df 如何从其他几个 dfs 之一将数据拉入 pandas df_master,其中拉取数据的 df 在 df_master 的每一行中都不同? - How to pull data into a pandas df_master from one of several other dfs, where the df from which data is pulled varies in each row of df_master? Python Pandas:将列表添加到不同 len 的 df - Python Pandas: add list to df of different len pandas df 元素从列表到列 - pandas df elements from list into column Python:在列表的列上合并两个Pandas数据框 - Python : Merge two Pandas Dataframes on a column of list Pandas groupby + ifelse +将新列添加回原始df - Pandas groupby + ifelse + add new column back to original df
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM