I have a column in a DataFrame
that contains JSON strings, each representing a list of dictionaries:
id Number Type Class Name datiles
0 292 C 1 2 A [{"did":{"id":"3","num":1},"NId":"a1,b1,c1","Att":null,"isnull":false,"number":"M90","label":[{"title":"Dear","Info":{"Id":null,"id2":2,"Name":"x"}},{"title":"Dear","Info":{"Id":null,"id2":2,"Name":"x"}}],"codes":[],"rule":null}]
1 293 C 1 2 A [{"did":{"id":"3","num":1},"NId":"a1,b1,c1","Att":null,"isnull":false,"number":"M90","label":[{"title":"Dear","Info":{"Id":null,"id2":2,"Name":"x"}},{"title":"Dear","Info":{"Id":null,"id2":2,"Name":"x"}}],"codes":[],"rule":null}]
I want to convert each row in datiles column to rows and columns and join them with the original data frame as shown on the sample below:
id Number Type Class Name did NId Att ..... .... label ........
0292 C 1 2 A {"id":"3","num":1} a1,b1,c1 null [{"title":"Dear","Info"{"Id":null,"id2":2,"Name":"x"}},{"title":"Dear","Info":{"Id":null,"id2":2,"Name":"x"}}]
I have done this as I need but I don't know how to join it with the original data frame since I don't have a key between them:
df['datiles']=data['datiles'].apply(json.loads)
df2 = pd.DataFrame([])
for x in df['datiles'].values.tolist():
df2 = df2 .append(pd.DataFrame(x))
display(df2)
How can I split the column and join at the same time? I have tried to use json_normalize but I get this error
AttributeError: 'list' object has no attribute 'values'
Also, I have seen those posts but does not work, may that because of the list structure
How to convert python JSON rows to dataframe columns without looping
Pandas split column of lists into multiple columns
How to split a list of dictionaries into multiple columns keeping the same index?
You can use the index of your df
and explicitly set it to your new DataFrame
to join with, like that:
df['datiles'] = df['datiles'].apply(json.loads).apply(pd.DataFrame)
out = df.drop('datiles', axis=1).join(
pd.concat(df['datiles'].values, keys=df.index).droplevel(1))
Explanation
apply
: json.loads
(as you had figured out) and pd.DataFrame
(that you had figured out too, but here we do it in an apply
instead of a loop).DataFrames
inside df['datiles']
, but uses the index of df
itself as keys. The result is a MultiIndex
, with possibly several rows for a given key (if the original datiles
JSON string was a list of more than 1 element). In any case, we drop that second level. Then join
does its usual thing (on indexes).Example
The setup is a bit verbose for a SO answer (I wish we had an expand
or fold
macro), so I pasted it in pastebin .
The point is, the first datiles
is a JSON list of two elements, just to exercise the logic above. Aside from that, it's the same content as per the OP.
Output
id Number Type Class Name did NId Att \
0 292 C 1 2 A {'id': '1', 'num': 1} a1,b1,c1 None
0 292 C 1 2 A {'id': '2', 'num': 1} a1,b1,c1 None
1 293 C 1 2 A {'id': '3', 'num': 1} a1,b1,c1 None
isnull number label codes \
0 False M90 [{'title': 'Dear', 'Info': {'Id': None, 'id2':... []
0 False M90 [{'title': 'Dear', 'Info': {'Id': None, 'id2':... []
1 False M90 [{'title': 'Dear', 'Info': {'Id': None, 'id2':... []
rule
0 None
0 None
1 None
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.