简体   繁体   English

包含对象列表的pandas列,根据键名拆分此列,并将值存储为逗号分隔的值

[英]pandas column containing list of objects, split this column based upon keynames and store values as comma separated values

I have a dataframe which contains column: 我有一个包含列的数据框:

A
[{"A": 28, "B": "abc"},{"A": 29, "B": "def"},{"A": 30, "B": "hij"}]
[{"A": 31, "B": "hij"},{"A": 32, "B": "abc"}]
[{"A": 28, "B": "abc"}]
[{"A": 28, "B": "abc"},{"A": 29, "B": "def"},{"A": 30, "B": "hij"}]
[{"A": 28, "B": "abc"},{"A": 29, "B": "klm"},{"A": 30, "B": "nop"}]
[{"A": 28, "B": "abc"},{"A": 29, "B": "xyz"}]

Output should be such that: 输出应为:

A              B
28,29,30       abc,def,hij
31,32          hij,abc
28             abc
28,29,30       abc,def,hij
28,29,30       abc,klm,nop
28,29          abc,xyz

How do i split the list of objects into columns depending on key names and have them stored as comma separated values as shown above. 我如何根据键名将对象列表分为几列,并将它们存储为逗号分隔的值,如上所示。

By using stack then groupby 通过使用stack然后groupby

df.A.apply(pd.Series).stack().\
     apply(pd.Series).groupby(level=0).\
        agg(lambda x :','.join(x.astype(str)))
Out[457]: 
          A            B
0  28,29,30  abc,def,hij
1     31,32      hij,abc
2        28          abc
3  28,29,30  abc,def,hij
4  28,29,30  abc,klm,nop

Data input: 数据输入:

df=pd.DataFrame({'A':[[{"A": 28, "B": "abc"},{"A": 29, "B": "def"},{"A": 30, "B": "hij"}],
[{"A": 31, "B": "hij"},{"A": 32, "B": "abc"}],
[{"A": 28, "B": "abc"}],[{"A": 28, "B": "abc"},{"A": 29, "B": "def"},{"A": 30, "B": "hij"}],
[{"A": 28, "B": "abc"},{"A": 29, "B": "klm"},{"A": 30, "B": "nop"}]]})

For your additional question read from csv 对于您的其他问题,请从csv中阅读

import ast
df=pd.read_csv(r'your.csv',dtype={'A':object})

df['A'] = df['A'].apply(ast.literal_eval)

I was assuming A was a list of list of dicts 我以为A是字典列表

A = [
    [{"A": 28, "B": "abc"},{"A": 29, "B": "def"},{"A": 30, "B": "hij"}],
    [{"A": 31, "B": "hij"},{"A": 32, "B": "abc"}],
    [{"A": 28, "B": "abc"}],
    [{"A": 28, "B": "abc"},{"A": 29, "B": "def"},{"A": 30, "B": "hij"}],
    [{"A": 28, "B": "abc"},{"A": 29, "B": "klm"},{"A": 30, "B": "nop"}],
    [{"A": 28, "B": "abc"},{"A": 29, "B": "xyz"}]
]

First thing I'd do is use comprehensions to create a new dictionary. 我要做的第一件事是使用理解能力来创建新词典。 Then ','.join within a groupby 然后','.join groupby

B = {
    (i, j, k): v
    for j, row in enumerate(A)
    for i, d in enumerate(row)
    for k, v in d.items()
}

pd.Series(B).astype(str).groupby(level=[1, 2]).apply(','.join).unstack()

          A            B
0  28,29,30  abc,def,hij
1     31,32      hij,abc
2        28          abc
3  28,29,30  abc,def,hij
4  28,29,30  abc,klm,nop
5     28,29      abc,xyz

Thought I'll take a shot at this. 以为我会为此开枪。 First, never use eval where you can avoid it. 首先, 切勿在可以避免使用eval地方使用它。 A better solution would be using ast : 更好的解决方案是使用ast

import ast
df.A = df.A.apply(ast.literal_eval)

Next, flatten your columns: 接下来,将您的列展平:

i = df.A.str.len().cumsum()   # we'll need this later
df = pd.DataFrame.from_dict(np.concatenate(df.A).tolist())
df.A = df.A.astype(str)

df

     A    B
0   28  abc
1   29  def
2   30  hij
3   31  hij
4   32  abc
5   28  abc
6   28  abc
7   29  def
8   30  hij
9   28  abc
10  29  klm
11  30  nop
12  28  abc
13  29  xyz

Now, perform a groupby using intervals made from i . 现在,使用由i间隔执行groupby

idx = pd.cut(df.index, bins=np.append([0], i), include_lowest=True, right=False)
df = df.groupby(idx, as_index=False).agg(','.join)

df

          A            B
0  28,29,30  abc,def,hij
1     31,32      hij,abc
2        28          abc
3  28,29,30  abc,def,hij
4  28,29,30  abc,klm,nop
5     28,29      abc,xyz

Had a little help from Bharath here . 在这里有巴拉特的帮助。


A cool alternative to the IntervalIndex ( proposed by Wen ) involves the use of np.put : IntervalIndex由Wen提出 )的一个很酷的替代方案是使用np.put

i = df.A.str.len().cumsum()  
df = pd.DataFrame.from_dict(np.concatenate(df.A).tolist())
df.A = df.A.astype(str)

v = pd.Series(0, index=df.index)
np.put(v, i-1, [1] * len(i))

df = df.groupby(v[::-1].cumsum()).agg(','.join)[::-1].reset_index(drop=True)

df

          A            B
0  28,29,30  abc,def,hij
1     31,32      hij,abc
2        28          abc
3  28,29,30  abc,def,hij
4  28,29,30  abc,klm,nop
5     28,29      abc,xyz

Performance 性能

df = pd.concat([df] * 1000, ignore_index=True)
%%timeit 
df.A.apply(pd.Series).stack().\
     apply(pd.Series).groupby(level=0).\
        agg(lambda x :','.join(x.astype(str)))

1 loop, best of 3: 8.76 s per loop
%%timeit 
A = df.A.values.tolist()
B = {
    (i, j, k): v
    for j, row in enumerate(A)
    for i, d in enumerate(row)
    for k, v in d.items()
}    
pd.Series(B).astype(str).groupby(level=[1, 2]).apply(','.join).unstack()

1 loop, best of 3: 2.08 s per loop
%%timeit
i = df.A.str.len().cumsum() 
df2 = pd.DataFrame.from_dict(np.concatenate(df.A).tolist())
df2.A = df2.A.astype(str)
idx = pd.cut(df2.index, bins=np.append([0], i), include_lowest=True, right=False)
df2.groupby(idx, as_index=False).agg(','.join)

1 loop, best of 3: 810 ms per loop
%%timeit
i = df.A.str.len().cumsum() 
df2 = pd.DataFrame.from_dict(np.concatenate(df.A).tolist())
df2.A = df2.A.astype(str)
v = pd.Series(0, index=df2.index)
np.put(v, i-1, [1] * len(i))
df2.groupby(v[::-1].cumsum()).agg(','.join)[::-1].reset_index(drop=True)

1 loop, best of 3: 548 ms per loop

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 以逗号分隔值的大熊猫分隔列,但保持顺序 - Split column in pandas of comma separated values but maintining the order pandas:根据列表和另一列条件替换逗号分隔列中的相应值 - pandas: replace corresponding values in a comma separated column based on a list and another column conditions 基于Pandas DataFrame拆分的列值列表 - Pandas dataframe split based list of column values 包含可变长度和逗号分隔的值字符串的熊猫行列如何堆叠成单独的值? - How is a pandas column of rows containing variable length and comma separated strings of values, stacked into separate values? 将列值拆分为以逗号分隔的值列表 - Splitting a column value into list of values separated by comma 根据 pandas 中的特定条件拆分以逗号分隔的列 - Split a column which is separated by comma based on certain condition in pandas 如何在Pandas列中拆分逗号分隔的单词列表? - How can I split a list of comma separated words in a Pandas column? Pandas 删除逗号分隔列值中的特定 int 值 - Pandas remove particular int values in comma separated column values pandas:如果列表中存在值,则将字符串添加到逗号分隔列中的某些值 - pandas: add a string to certain values in a comma separated column if values exist in a list 根据文件名生成熊猫列值 - Generating pandas column values based upon filename
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM