如何計算列表列中列值的出現次數？

Question

考慮以下 dataframe：

    column_of_lists   scalar_col
0   [100, 200, 300]       100
1   [100, 200, 200]       200
2   [300, 500]            300
3   [100, 100]            200

所需的 output 將是一個系列，表示scalar_col的標量值在列表列中出現的次數。

所以，在我們的例子中：

1 # 100 appears once in its respective list
2 # 200 appears twice in its respective list
1 # ...
0

我嘗試了以下方法：

df['column_of_lists'].apply(lambda x: x.count(df['scalar_col'])

我知道它不會工作，因為我要求它計算一個系列而不是單個值。

歡迎任何幫助！

Answer 1

使用列表理解：

df['new'] = [x.count(y) for x,y in zip(df['column_of_lists'], df['scalar_col'])]
print (df)
   column_of_lists  scalar_col  new
0  [100, 200, 300]         100    1
1  [100, 200, 200]         200    2
2       [300, 500]         300    1
3       [100, 100]         200    0

如果性能不重要，請使用DataFrame.apply和axis=1 ：

df["new"] = df.apply(lambda x: x["column_of_lists"].count(x["scalar_col"]), axis=1)

#40k rows
df = pd.concat([df] * 10000, ignore_index=True)

In [145]: %timeit df["new1"] = df.apply(lambda x: x["column_of_lists"].count(x["scalar_col"]), axis=1)
572 ms ± 99.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [146]: %timeit df['new2'] = [x.count(y) for x,y in zip(df['column_of_lists'], df['scalar_col'])]
22.7 ms ± 840 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [147]: %%timeit
     ...: x = df.explode('column_of_lists')
     ...: df['counts'] = x.column_of_lists.eq(x.scalar_col).groupby(x.index).sum()
     ...: 
61.2 ms ± 306 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Answer 2

您使用計數並申請。

代碼：

import pandas as pd
in_df = pd.DataFrame({"column_of_lists": [[100, 200, 300] , 
                                        [100, 200, 200], 
                                        [300, 500], [100, 100]],
            "scalar_col": [100, 200, 300, 200]})
in_df["Match_Count"] = in_df.apply(lambda x:x["column_of_lists"].count(x["scalar_col"]), axis=1)

Output：

column_of_lists  scalar_col  Match_Count
0  [100, 200, 300]         100            1
1  [100, 200, 200]         200            2
2       [300, 500]         300            1
3       [100, 100]         200            0

Answer 3

較大列表的矢量化方法是使用DataFrame.explode然后是GroupBy.sum

x = df.explode('column_of_lists')
df['counts'] = x.column_of_lists.eq(x.scalar_col).groupby(x.index).sum()

Output

   column_of_lists  scalar_col  counts
0  [100, 200, 300]         100       1
1  [100, 200, 200]         200       2
2       [300, 500]         300       1
3       [100, 100]         200       0

如何計算列表列中列值的出現次數？

問題描述

3 個解決方案

解決方案1
6 已采納 2023-01-12 12:34:49

解決方案2
1 2023-01-12 12:36:04

解決方案3
1 2023-01-12 12:46:12

如何計算列表列中列值的出現次數？

問題描述

3 個解決方案

解決方案1 6 已采納 2023-01-12 12:34:49

解決方案2 1 2023-01-12 12:36:04

解決方案3 1 2023-01-12 12:46:12

解決方案1
6 已采納 2023-01-12 12:34:49

解決方案2
1 2023-01-12 12:36:04

解決方案3
1 2023-01-12 12:46:12