简体   繁体   English

字典列表到数据框列中

[英]list of dictionaries into data frame columns

I have a column in a DataFrame that contains JSON strings, each representing a list of dictionaries:我在DataFrame中有一个列,其中包含 JSON 字符串,每个字符串代表一个字典列表:

    id Number  Type  Class Name                                                                                                                                                                                                                                datiles
0  292      C     1      2    A  [{"did":{"id":"3","num":1},"NId":"a1,b1,c1","Att":null,"isnull":false,"number":"M90","label":[{"title":"Dear","Info":{"Id":null,"id2":2,"Name":"x"}},{"title":"Dear","Info":{"Id":null,"id2":2,"Name":"x"}}],"codes":[],"rule":null}]
1  293      C     1      2    A  [{"did":{"id":"3","num":1},"NId":"a1,b1,c1","Att":null,"isnull":false,"number":"M90","label":[{"title":"Dear","Info":{"Id":null,"id2":2,"Name":"x"}},{"title":"Dear","Info":{"Id":null,"id2":2,"Name":"x"}}],"codes":[],"rule":null}]

I want to convert each row in datiles column to rows and columns and join them with the original data frame as shown on the sample below:我想将 datiles 列中的每一行转换为行和列,并将它们与原始数据框连接起来,如下面的示例所示:

       id     Number       Type     Class      Name            did                NId        Att  ..... .... label ........
      0292           C          1          2        A     {"id":"3","num":1}    a1,b1,c1    null    [{"title":"Dear","Info"{"Id":null,"id2":2,"Name":"x"}},{"title":"Dear","Info":{"Id":null,"id2":2,"Name":"x"}}]

I have done this as I need but I don't know how to join it with the original data frame since I don't have a key between them:我已经根据需要完成了此操作,但我不知道如何将其与原始数据框连接起来,因为它们之间没有密钥:

df['datiles']=data['datiles'].apply(json.loads)

df2 = pd.DataFrame([])

for x in df['datiles'].values.tolist():
    df2 = df2 .append(pd.DataFrame(x))
display(df2)

How can I split the column and join at the same time?如何拆分列并同时加入? I have tried to use json_normalize but I get this error我曾尝试使用 json_normalize 但我收到此错误

AttributeError: 'list' object has no attribute 'values'

Also, I have seen those posts but does not work, may that because of the list structure另外,我看过那些帖子但不起作用,可能是因为列表结构

How to convert python JSON rows to dataframe columns without looping 如何在不循环的情况下将 python JSON 行转换为 dataframe 列

Pandas split column of lists into multiple columns Pandas 将列表的列拆分为多列

How to split a list of dictionaries into multiple columns keeping the same index? 如何将字典列表拆分为保持相同索引的多个列?

You can use the index of your df and explicitly set it to your new DataFrame to join with, like that:您可以使用df的索引并将其显式设置为新的DataFrame以加入,如下所示:

df['datiles'] = df['datiles'].apply(json.loads).apply(pd.DataFrame)
out = df.drop('datiles', axis=1).join(
    pd.concat(df['datiles'].values, keys=df.index).droplevel(1))

Explanation解释

  1. The first line does a double apply : json.loads (as you had figured out) and pd.DataFrame (that you had figured out too, but here we do it in an apply instead of a loop).第一行执行双重applyjson.loads (如您所想)和pd.DataFrame (您也已想通,但在这里我们在apply而不是循环中执行此操作)。
  2. The second line concats all those DataFrames inside df['datiles'] , but uses the index of df itself as keys.第二行将df['datiles']中的所有DataFrames连接起来,但使用df本身的索引作为键。 The result is a MultiIndex , with possibly several rows for a given key (if the original datiles JSON string was a list of more than 1 element).结果是一个MultiIndex ,给定键可能有几行(如果原始datiles字符串是超过 1 个元素的列表)。 In any case, we drop that second level.无论如何,我们放弃了第二个级别。 Then join does its usual thing (on indexes).然后join做它通常的事情(在索引上)。

Example例子

The setup is a bit verbose for a SO answer (I wish we had an expand or fold macro), so I pasted it in pastebin .对于 SO 答案,设置有点冗长(我希望我们有一个expandfold宏),所以我将它粘贴到pastebin中。

The point is, the first datiles is a JSON list of two elements, just to exercise the logic above.关键是,第一个数据是两个元素的datiles列表,只是为了练习上面的逻辑。 Aside from that, it's the same content as per the OP.除此之外,它与OP的内容相同。

Output Output

    id Number  Type  Class Name                    did       NId   Att  \
0  292      C     1      2    A  {'id': '1', 'num': 1}  a1,b1,c1  None   
0  292      C     1      2    A  {'id': '2', 'num': 1}  a1,b1,c1  None   
1  293      C     1      2    A  {'id': '3', 'num': 1}  a1,b1,c1  None   

   isnull number                                              label codes  \
0   False    M90  [{'title': 'Dear', 'Info': {'Id': None, 'id2':...    []   
0   False    M90  [{'title': 'Dear', 'Info': {'Id': None, 'id2':...    []   
1   False    M90  [{'title': 'Dear', 'Info': {'Id': None, 'id2':...    []   

   rule  
0  None  
0  None  
1  None

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM