![](/img/trans.png)
[英]How to count occurrences of each unique value within a column of lists Pandas
[英]How to count the occurrences of a column's value in a column of lists?
考慮以下 dataframe:
column_of_lists scalar_col
0 [100, 200, 300] 100
1 [100, 200, 200] 200
2 [300, 500] 300
3 [100, 100] 200
所需的 output 將是一個系列,表示scalar_col
的標量值在列表列中出現的次數。
所以,在我們的例子中:
1 # 100 appears once in its respective list
2 # 200 appears twice in its respective list
1 # ...
0
我嘗試了以下方法:
df['column_of_lists'].apply(lambda x: x.count(df['scalar_col'])
我知道它不會工作,因為我要求它計算一個系列而不是單個值。
歡迎任何幫助!
使用列表理解:
df['new'] = [x.count(y) for x,y in zip(df['column_of_lists'], df['scalar_col'])]
print (df)
column_of_lists scalar_col new
0 [100, 200, 300] 100 1
1 [100, 200, 200] 200 2
2 [300, 500] 300 1
3 [100, 100] 200 0
如果性能不重要,請使用DataFrame.apply
和axis=1
:
df["new"] = df.apply(lambda x: x["column_of_lists"].count(x["scalar_col"]), axis=1)
#40k rows
df = pd.concat([df] * 10000, ignore_index=True)
In [145]: %timeit df["new1"] = df.apply(lambda x: x["column_of_lists"].count(x["scalar_col"]), axis=1)
572 ms ± 99.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [146]: %timeit df['new2'] = [x.count(y) for x,y in zip(df['column_of_lists'], df['scalar_col'])]
22.7 ms ± 840 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [147]: %%timeit
...: x = df.explode('column_of_lists')
...: df['counts'] = x.column_of_lists.eq(x.scalar_col).groupby(x.index).sum()
...:
61.2 ms ± 306 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
您使用計數並申請。
代碼:
import pandas as pd
in_df = pd.DataFrame({"column_of_lists": [[100, 200, 300] ,
[100, 200, 200],
[300, 500], [100, 100]],
"scalar_col": [100, 200, 300, 200]})
in_df["Match_Count"] = in_df.apply(lambda x:x["column_of_lists"].count(x["scalar_col"]), axis=1)
Output:
column_of_lists scalar_col Match_Count
0 [100, 200, 300] 100 1
1 [100, 200, 200] 200 2
2 [300, 500] 300 1
3 [100, 100] 200 0
較大列表的矢量化方法是使用DataFrame.explode
然后是GroupBy.sum
x = df.explode('column_of_lists')
df['counts'] = x.column_of_lists.eq(x.scalar_col).groupby(x.index).sum()
Output
column_of_lists scalar_col counts
0 [100, 200, 300] 100 1
1 [100, 200, 200] 200 2
2 [300, 500] 300 1
3 [100, 100] 200 0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.