[英]pandas column containing list of objects, split this column based upon keynames and store values as comma separated values
我有一個包含列的數據框:
A
[{"A": 28, "B": "abc"},{"A": 29, "B": "def"},{"A": 30, "B": "hij"}]
[{"A": 31, "B": "hij"},{"A": 32, "B": "abc"}]
[{"A": 28, "B": "abc"}]
[{"A": 28, "B": "abc"},{"A": 29, "B": "def"},{"A": 30, "B": "hij"}]
[{"A": 28, "B": "abc"},{"A": 29, "B": "klm"},{"A": 30, "B": "nop"}]
[{"A": 28, "B": "abc"},{"A": 29, "B": "xyz"}]
輸出應為:
A B
28,29,30 abc,def,hij
31,32 hij,abc
28 abc
28,29,30 abc,def,hij
28,29,30 abc,klm,nop
28,29 abc,xyz
我如何根據鍵名將對象列表分為幾列,並將它們存儲為逗號分隔的值,如上所示。
通過使用stack
然后groupby
df.A.apply(pd.Series).stack().\
apply(pd.Series).groupby(level=0).\
agg(lambda x :','.join(x.astype(str)))
Out[457]:
A B
0 28,29,30 abc,def,hij
1 31,32 hij,abc
2 28 abc
3 28,29,30 abc,def,hij
4 28,29,30 abc,klm,nop
數據輸入:
df=pd.DataFrame({'A':[[{"A": 28, "B": "abc"},{"A": 29, "B": "def"},{"A": 30, "B": "hij"}],
[{"A": 31, "B": "hij"},{"A": 32, "B": "abc"}],
[{"A": 28, "B": "abc"}],[{"A": 28, "B": "abc"},{"A": 29, "B": "def"},{"A": 30, "B": "hij"}],
[{"A": 28, "B": "abc"},{"A": 29, "B": "klm"},{"A": 30, "B": "nop"}]]})
對於您的其他問題,請從csv中閱讀
import ast
df=pd.read_csv(r'your.csv',dtype={'A':object})
df['A'] = df['A'].apply(ast.literal_eval)
我以為A
是字典列表
A = [
[{"A": 28, "B": "abc"},{"A": 29, "B": "def"},{"A": 30, "B": "hij"}],
[{"A": 31, "B": "hij"},{"A": 32, "B": "abc"}],
[{"A": 28, "B": "abc"}],
[{"A": 28, "B": "abc"},{"A": 29, "B": "def"},{"A": 30, "B": "hij"}],
[{"A": 28, "B": "abc"},{"A": 29, "B": "klm"},{"A": 30, "B": "nop"}],
[{"A": 28, "B": "abc"},{"A": 29, "B": "xyz"}]
]
我要做的第一件事是使用理解能力來創建新詞典。 然后','.join
groupby
B = {
(i, j, k): v
for j, row in enumerate(A)
for i, d in enumerate(row)
for k, v in d.items()
}
pd.Series(B).astype(str).groupby(level=[1, 2]).apply(','.join).unstack()
A B
0 28,29,30 abc,def,hij
1 31,32 hij,abc
2 28 abc
3 28,29,30 abc,def,hij
4 28,29,30 abc,klm,nop
5 28,29 abc,xyz
以為我會為此開槍。 首先, 切勿在可以避免使用eval
地方使用它。 更好的解決方案是使用ast
:
import ast
df.A = df.A.apply(ast.literal_eval)
接下來,將您的列展平:
i = df.A.str.len().cumsum() # we'll need this later
df = pd.DataFrame.from_dict(np.concatenate(df.A).tolist())
df.A = df.A.astype(str)
df
A B
0 28 abc
1 29 def
2 30 hij
3 31 hij
4 32 abc
5 28 abc
6 28 abc
7 29 def
8 30 hij
9 28 abc
10 29 klm
11 30 nop
12 28 abc
13 29 xyz
現在,使用由i
間隔執行groupby
。
idx = pd.cut(df.index, bins=np.append([0], i), include_lowest=True, right=False)
df = df.groupby(idx, as_index=False).agg(','.join)
df
A B
0 28,29,30 abc,def,hij
1 31,32 hij,abc
2 28 abc
3 28,29,30 abc,def,hij
4 28,29,30 abc,klm,nop
5 28,29 abc,xyz
在這里有巴拉特的幫助。
IntervalIndex
( 由Wen提出 )的一個很酷的替代方案是使用np.put
:
i = df.A.str.len().cumsum()
df = pd.DataFrame.from_dict(np.concatenate(df.A).tolist())
df.A = df.A.astype(str)
v = pd.Series(0, index=df.index)
np.put(v, i-1, [1] * len(i))
df = df.groupby(v[::-1].cumsum()).agg(','.join)[::-1].reset_index(drop=True)
df
A B
0 28,29,30 abc,def,hij
1 31,32 hij,abc
2 28 abc
3 28,29,30 abc,def,hij
4 28,29,30 abc,klm,nop
5 28,29 abc,xyz
df = pd.concat([df] * 1000, ignore_index=True)
%%timeit
df.A.apply(pd.Series).stack().\
apply(pd.Series).groupby(level=0).\
agg(lambda x :','.join(x.astype(str)))
1 loop, best of 3: 8.76 s per loop
%%timeit
A = df.A.values.tolist()
B = {
(i, j, k): v
for j, row in enumerate(A)
for i, d in enumerate(row)
for k, v in d.items()
}
pd.Series(B).astype(str).groupby(level=[1, 2]).apply(','.join).unstack()
1 loop, best of 3: 2.08 s per loop
%%timeit
i = df.A.str.len().cumsum()
df2 = pd.DataFrame.from_dict(np.concatenate(df.A).tolist())
df2.A = df2.A.astype(str)
idx = pd.cut(df2.index, bins=np.append([0], i), include_lowest=True, right=False)
df2.groupby(idx, as_index=False).agg(','.join)
1 loop, best of 3: 810 ms per loop
%%timeit
i = df.A.str.len().cumsum()
df2 = pd.DataFrame.from_dict(np.concatenate(df.A).tolist())
df2.A = df2.A.astype(str)
v = pd.Series(0, index=df2.index)
np.put(v, i-1, [1] * len(i))
df2.groupby(v[::-1].cumsum()).agg(','.join)[::-1].reset_index(drop=True)
1 loop, best of 3: 548 ms per loop
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.