[英]python pandas - add unique Ids in column from master df back in to processed dfs stored in list of dataframes
I have a single df that includes multiple json strings per row that need reading and normalizing.我有一个 df,其中每行包含多个 json 字符串,需要读取和规范化。
I can read out the json info and normalize the columns by storing each row as a new dataframe in a list - which i have done with the code below.我可以读出 json 信息并通过将每一行存储为列表中的新 dataframe 来规范化列 - 我已经使用下面的代码完成了。
However I need to append the original unique Id in the original df (ie 'id': ['9clpa','g659am']) - which is lost in my current code.但是我需要 append 原始df中的原始唯一ID(即'id':['9clpa','g659am']) - 这在我当前的代码中丢失了。
The expected output is a list of dataframes per Id that include the exploded json info, with an additional column including Id (which will be repeated for each row of the final df).预期的 output 是每个 Id 的数据帧列表,其中包括分解的 json 信息,以及包含 Id 的附加列(将为最终 df 的每一行重复)。
I hope that makes sense, any suggestions are very welcome.我希望这是有道理的,非常欢迎任何建议。 thanks so much非常感谢
dataframe dataframe
df = pd.DataFrame(data={'id': ['9clpa','g659am'],'i2': [('{"t":"unique678","q":[{"qi":"01","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"G","value":"3"},{"answer":"V","value":"4"}]},{"qi":"02","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"A","value":"3"},{"answer":"B","value":"4"},{"answer":"G","value":"5"},{"answer":"NC","value":"6"},{"answer":"O","value":"7"} ]}]}'),('{"t":"unique428","q":[{"qi":"01","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"G","value":"3"},{"answer":"V","value":"4"}]},{"qi":"02","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"A","value":"3"},{"answer":"B","value":"4"},{"answer":"G","value":"5"},{"answer":"NC","value":"6"},{"answer":"O","value":"7"} ]}]}')]})
current code当前代码
out={}
for i in range(len(df)):
out[i] = pd.read_json(df.i2[i])
out[i] = pd.json_normalize(out[i].q)
expected output预计 output
pd.DataFrame(data={'id': ['9clpa','9clpa'],'qi': ['01','02'], 'answers': ['{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"G","value":"3"},{"answer":"V","value":"4"}', '"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"A","value":"3"},{"answer":"B","value":"4"},{"answer":"G","value":"5"},{"answer":"NC","value":"6"},{"answer":"O","value":"7"']})
pd.DataFrame(data={'id': ['g659am','g659am'],'qi': ['01','02'], 'answers': ['{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"G","value":"3"},{"answer":"V","value":"4"}', '"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"A","value":"3"},{"answer":"B","value":"4"},{"answer":"G","value":"5"},{"answer":"NC","value":"6"},{"answer":"O","value":"7"']})
df = pd.DataFrame(data={'id': ['9clpa','g659am'],'i2': [('{"t":"unique678","q":[{"qi":"01","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"G","value":"3"},{"answer":"V","value":"4"}]},{"qi":"02","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"A","value":"3"},{"answer":"B","value":"4"},{"answer":"G","value":"5"},{"answer":"NC","value":"6"},{"answer":"O","value":"7"} ]}]}'),('{"t":"unique428","q":[{"qi":"01","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"G","value":"3"},{"answer":"V","value":"4"}]},{"qi":"02","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"A","value":"3"},{"answer":"B","value":"4"},{"answer":"G","value":"5"},{"answer":"NC","value":"6"},{"answer":"O","value":"7"} ]}]}')]})
out={}
columns1 = ['id','qi','answers']
for i in range(len(df)):
out[i] = pd.read_json(df.i2[i])
out[i] = pd.json_normalize(out[i].q)
df_new = pd.DataFrame(data=out[i],columns=columns1)
df_new = df_new.assign(id = lambda x: df.id[i])
display(df_new)
You can add a lambda function which will assign the value of 'id' to new df formed.您可以添加一个 lambda function 将“id”的值分配给新形成的 df。
Edit: You can add location of 'id' column, in columns1 and define where you want it to appear when you create a dataframe.编辑:您可以在 columns1 中添加“id”列的位置,并在创建 dataframe 时定义您希望它出现的位置。
Output dataframe: Output dataframe:
You are just missing on assigning the id
to your dataframe after your normalize columns:在规范化列之后,您只是缺少将id
分配给 dataframe :
out={}
for i in range(len(df)):
out[i] = pd.read_json(df.i2[i])
out[i] = pd.json_normalize(out[i].q)
out[i]['id'] = df.id[i]
out[i] = out[i].loc[:, ['id','qi','answers']]
Output: Output:
>>> out[0]
id qi answers
0 9clpa 01 [{'answer': 'M', 'value': '1'}, {'answer': 'F', 'value': '2'}, {'answer': 'G', 'value': '3'}, {'answer': 'V', 'value': '4'}]
1 9clpa 02 [{'answer': 'M', 'value': '1'}, {'answer': 'F', 'value': '2'}, {'answer': 'A', 'value': '3'}, {'answer': 'B', 'value': '4'}, {'answer': 'G', 'value': '5'}, {'answer': 'NC', 'value': '6'}, {'answer': 'O', 'value': '7'}]
You can use .json_normalize
(doc here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.json_normalize.html )您可以使用.json_normalize
(此处的文档: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.json_normalize.html )
(from https://medium.com/swlh/converting-nested-json-structures-to-pandas-dataframes-e8106c59976e ) (来自https://medium.com/swlh/converting-nested-json-structures-to-pandas-dataframes-e8106c59976e )
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.