[英]list of dictionaries into data frame columns
I have a column in a DataFrame
that contains JSON strings, each representing a list of dictionaries:我在DataFrame
中有一个列,其中包含 JSON 字符串,每个字符串代表一个字典列表:
id Number Type Class Name datiles
0 292 C 1 2 A [{"did":{"id":"3","num":1},"NId":"a1,b1,c1","Att":null,"isnull":false,"number":"M90","label":[{"title":"Dear","Info":{"Id":null,"id2":2,"Name":"x"}},{"title":"Dear","Info":{"Id":null,"id2":2,"Name":"x"}}],"codes":[],"rule":null}]
1 293 C 1 2 A [{"did":{"id":"3","num":1},"NId":"a1,b1,c1","Att":null,"isnull":false,"number":"M90","label":[{"title":"Dear","Info":{"Id":null,"id2":2,"Name":"x"}},{"title":"Dear","Info":{"Id":null,"id2":2,"Name":"x"}}],"codes":[],"rule":null}]
I want to convert each row in datiles column to rows and columns and join them with the original data frame as shown on the sample below:我想将 datiles 列中的每一行转换为行和列,并将它们与原始数据框连接起来,如下面的示例所示:
id Number Type Class Name did NId Att ..... .... label ........
0292 C 1 2 A {"id":"3","num":1} a1,b1,c1 null [{"title":"Dear","Info"{"Id":null,"id2":2,"Name":"x"}},{"title":"Dear","Info":{"Id":null,"id2":2,"Name":"x"}}]
I have done this as I need but I don't know how to join it with the original data frame since I don't have a key between them:我已经根据需要完成了此操作,但我不知道如何将其与原始数据框连接起来,因为它们之间没有密钥:
df['datiles']=data['datiles'].apply(json.loads)
df2 = pd.DataFrame([])
for x in df['datiles'].values.tolist():
df2 = df2 .append(pd.DataFrame(x))
display(df2)
How can I split the column and join at the same time?如何拆分列并同时加入? I have tried to use json_normalize but I get this error我曾尝试使用 json_normalize 但我收到此错误
AttributeError: 'list' object has no attribute 'values'
Also, I have seen those posts but does not work, may that because of the list structure另外,我看过那些帖子但不起作用,可能是因为列表结构
How to convert python JSON rows to dataframe columns without looping 如何在不循环的情况下将 python JSON 行转换为 dataframe 列
Pandas split column of lists into multiple columns Pandas 将列表的列拆分为多列
How to split a list of dictionaries into multiple columns keeping the same index? 如何将字典列表拆分为保持相同索引的多个列?
You can use the index of your df
and explicitly set it to your new DataFrame
to join with, like that:您可以使用df
的索引并将其显式设置为新的DataFrame
以加入,如下所示:
df['datiles'] = df['datiles'].apply(json.loads).apply(pd.DataFrame)
out = df.drop('datiles', axis=1).join(
pd.concat(df['datiles'].values, keys=df.index).droplevel(1))
Explanation解释
apply
: json.loads
(as you had figured out) and pd.DataFrame
(that you had figured out too, but here we do it in an apply
instead of a loop).第一行执行双重apply
: json.loads
(如您所想)和pd.DataFrame
(您也已想通,但在这里我们在apply
而不是循环中执行此操作)。DataFrames
inside df['datiles']
, but uses the index of df
itself as keys.第二行将df['datiles']
中的所有DataFrames
连接起来,但使用df
本身的索引作为键。 The result is a MultiIndex
, with possibly several rows for a given key (if the original datiles
JSON string was a list of more than 1 element).结果是一个MultiIndex
,给定键可能有几行(如果原始datiles
字符串是超过 1 个元素的列表)。 In any case, we drop that second level.无论如何,我们放弃了第二个级别。 Then join
does its usual thing (on indexes).然后join
做它通常的事情(在索引上)。Example例子
The setup is a bit verbose for a SO answer (I wish we had an expand
or fold
macro), so I pasted it in pastebin .对于 SO 答案,设置有点冗长(我希望我们有一个expand
或fold
宏),所以我将它粘贴到pastebin中。
The point is, the first datiles
is a JSON list of two elements, just to exercise the logic above.关键是,第一个数据是两个元素的datiles
列表,只是为了练习上面的逻辑。 Aside from that, it's the same content as per the OP.除此之外,它与OP的内容相同。
Output Output
id Number Type Class Name did NId Att \
0 292 C 1 2 A {'id': '1', 'num': 1} a1,b1,c1 None
0 292 C 1 2 A {'id': '2', 'num': 1} a1,b1,c1 None
1 293 C 1 2 A {'id': '3', 'num': 1} a1,b1,c1 None
isnull number label codes \
0 False M90 [{'title': 'Dear', 'Info': {'Id': None, 'id2':... []
0 False M90 [{'title': 'Dear', 'Info': {'Id': None, 'id2':... []
1 False M90 [{'title': 'Dear', 'Info': {'Id': None, 'id2':... []
rule
0 None
0 None
1 None
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.