python pandas - 将列中的唯一 ID 从主 df 添加回存储在数据帧列表中的已处理 df

Question

I have a single df that includes multiple json strings per row that need reading and normalizing.我有一个 df，其中每行包含多个 json 字符串，需要读取和规范化。

I can read out the json info and normalize the columns by storing each row as a new dataframe in a list - which i have done with the code below.我可以读出 json 信息并通过将每一行存储为列表中的新 dataframe 来规范化列 - 我已经使用下面的代码完成了。

However I need to append the original unique Id in the original df (ie 'id': ['9clpa','g659am']) - which is lost in my current code.但是我需要 append 原始df中的原始唯一ID（即'id'：['9clpa'，'g659am']） - 这在我当前的代码中丢失了。

The expected output is a list of dataframes per Id that include the exploded json info, with an additional column including Id (which will be repeated for each row of the final df).预期的 output 是每个 Id 的数据帧列表，其中包括分解的 json 信息，以及包含 Id 的附加列（将为最终 df 的每一行重复）。

I hope that makes sense, any suggestions are very welcome.我希望这是有道理的，非常欢迎任何建议。 thanks so much非常感谢

dataframe dataframe

df = pd.DataFrame(data={'id': ['9clpa','g659am'],'i2': [('{"t":"unique678","q":[{"qi":"01","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"G","value":"3"},{"answer":"V","value":"4"}]},{"qi":"02","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"A","value":"3"},{"answer":"B","value":"4"},{"answer":"G","value":"5"},{"answer":"NC","value":"6"},{"answer":"O","value":"7"} ]}]}'),('{"t":"unique428","q":[{"qi":"01","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"G","value":"3"},{"answer":"V","value":"4"}]},{"qi":"02","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"A","value":"3"},{"answer":"B","value":"4"},{"answer":"G","value":"5"},{"answer":"NC","value":"6"},{"answer":"O","value":"7"} ]}]}')]})

current code当前代码

out={}
for i in range(len(df)):
    out[i] = pd.read_json(df.i2[i])
    out[i] = pd.json_normalize(out[i].q)

expected output预计 output

pd.DataFrame(data={'id': ['9clpa','9clpa'],'qi': ['01','02'], 'answers': ['{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"G","value":"3"},{"answer":"V","value":"4"}', '"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"A","value":"3"},{"answer":"B","value":"4"},{"answer":"G","value":"5"},{"answer":"NC","value":"6"},{"answer":"O","value":"7"']})
pd.DataFrame(data={'id': ['g659am','g659am'],'qi': ['01','02'], 'answers': ['{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"G","value":"3"},{"answer":"V","value":"4"}', '"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"A","value":"3"},{"answer":"B","value":"4"},{"answer":"G","value":"5"},{"answer":"NC","value":"6"},{"answer":"O","value":"7"']})

Answer 1

df = pd.DataFrame(data={'id': ['9clpa','g659am'],'i2': [('{"t":"unique678","q":[{"qi":"01","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"G","value":"3"},{"answer":"V","value":"4"}]},{"qi":"02","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"A","value":"3"},{"answer":"B","value":"4"},{"answer":"G","value":"5"},{"answer":"NC","value":"6"},{"answer":"O","value":"7"} ]}]}'),('{"t":"unique428","q":[{"qi":"01","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"G","value":"3"},{"answer":"V","value":"4"}]},{"qi":"02","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"A","value":"3"},{"answer":"B","value":"4"},{"answer":"G","value":"5"},{"answer":"NC","value":"6"},{"answer":"O","value":"7"} ]}]}')]})
out={}
columns1 = ['id','qi','answers']
for i in range(len(df)):
    out[i] = pd.read_json(df.i2[i])
    out[i] = pd.json_normalize(out[i].q)
    df_new = pd.DataFrame(data=out[i],columns=columns1)
    df_new = df_new.assign(id = lambda x: df.id[i])
    display(df_new)

You can add a lambda function which will assign the value of 'id' to new df formed.您可以添加一个 lambda function 将“id”的值分配给新形成的 df。

Edit: You can add location of 'id' column, in columns1 and define where you want it to appear when you create a dataframe.编辑：您可以在 columns1 中添加“id”列的位置，并在创建 dataframe 时定义您希望它出现的位置。

Output dataframe: Output dataframe：

Answer 2

You are just missing on assigning the id to your dataframe after your normalize columns:在规范化列之后，您只是缺少将id分配给 dataframe ：

out={}
for i in range(len(df)):
    out[i] = pd.read_json(df.i2[i])
    out[i] = pd.json_normalize(out[i].q)
    out[i]['id'] = df.id[i]
    out[i] = out[i].loc[:, ['id','qi','answers']]

Output: Output：

>>> out[0]
      id  qi                                                                                                                                                                                                                     answers
0  9clpa  01                                                                                                [{'answer': 'M', 'value': '1'}, {'answer': 'F', 'value': '2'}, {'answer': 'G', 'value': '3'}, {'answer': 'V', 'value': '4'}]
1  9clpa  02  [{'answer': 'M', 'value': '1'}, {'answer': 'F', 'value': '2'}, {'answer': 'A', 'value': '3'}, {'answer': 'B', 'value': '4'}, {'answer': 'G', 'value': '5'}, {'answer': 'NC', 'value': '6'}, {'answer': 'O', 'value': '7'}]

Answer 3

You can use .json_normalize (doc here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.json_normalize.html )您可以使用.json_normalize （此处的文档： https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.json_normalize.html ）

(from https://medium.com/swlh/converting-nested-json-structures-to-pandas-dataframes-e8106c59976e ) （来自https://medium.com/swlh/converting-nested-json-structures-to-pandas-dataframes-e8106c59976e ）

python pandas - 将列中的唯一 ID 从主 df 添加回存储在数据帧列表中的已处理 df

问题描述

3 个解决方案

解决方案1
1 2021-05-29 11:36:59

解决方案2
0 已采纳 2021-05-29 11:43:33

解决方案3
-1 2021-05-29 11:43:05

python pandas - 将列中的唯一 ID 从主 df 添加回存储在数据帧列表中的已处理 df

问题描述

3 个解决方案

解决方案1 1 2021-05-29 11:36:59

解决方案2 0 已采纳 2021-05-29 11:43:33

解决方案3 -1 2021-05-29 11:43:05

解决方案1
1 2021-05-29 11:36:59

解决方案2
0 已采纳 2021-05-29 11:43:33

解决方案3
-1 2021-05-29 11:43:05