如何计算列表列中列值的出现次数？

Question

Consider the following dataframe:考虑以下 dataframe：

    column_of_lists   scalar_col
0   [100, 200, 300]       100
1   [100, 200, 200]       200
2   [300, 500]            300
3   [100, 100]            200

The desired output would be a Series, representing how many times the scalar value of scalar_col appears inside the list column.所需的 output 将是一个系列，表示scalar_col的标量值在列表列中出现的次数。

So, in our case:所以，在我们的例子中：

1 # 100 appears once in its respective list
2 # 200 appears twice in its respective list
1 # ...
0

I have tried something along the lines of:我尝试了以下方法：

df['column_of_lists'].apply(lambda x: x.count(df['scalar_col'])

and I get it that it won't work because I am asking it to count a Series instead of a single value.我知道它不会工作，因为我要求它计算一个系列而不是单个值。

Any help would be welcome!欢迎任何帮助！

Answer 1

Use list comprehension:使用列表理解：

df['new'] = [x.count(y) for x,y in zip(df['column_of_lists'], df['scalar_col'])]
print (df)
   column_of_lists  scalar_col  new
0  [100, 200, 300]         100    1
1  [100, 200, 200]         200    2
2       [300, 500]         300    1
3       [100, 100]         200    0

If performance not important use DataFrame.apply with axis=1 :如果性能不重要，请使用DataFrame.apply和axis=1 ：

df["new"] = df.apply(lambda x: x["column_of_lists"].count(x["scalar_col"]), axis=1)

#40k rows
df = pd.concat([df] * 10000, ignore_index=True)

In [145]: %timeit df["new1"] = df.apply(lambda x: x["column_of_lists"].count(x["scalar_col"]), axis=1)
572 ms ± 99.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [146]: %timeit df['new2'] = [x.count(y) for x,y in zip(df['column_of_lists'], df['scalar_col'])]
22.7 ms ± 840 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [147]: %%timeit
     ...: x = df.explode('column_of_lists')
     ...: df['counts'] = x.column_of_lists.eq(x.scalar_col).groupby(x.index).sum()
     ...: 
61.2 ms ± 306 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Answer 2

You use count and apply.您使用计数并申请。

Code:代码：

import pandas as pd
in_df = pd.DataFrame({"column_of_lists": [[100, 200, 300] , 
                                        [100, 200, 200], 
                                        [300, 500], [100, 100]],
            "scalar_col": [100, 200, 300, 200]})
in_df["Match_Count"] = in_df.apply(lambda x:x["column_of_lists"].count(x["scalar_col"]), axis=1)

Output: Output：

column_of_lists  scalar_col  Match_Count
0  [100, 200, 300]         100            1
1  [100, 200, 200]         200            2
2       [300, 500]         300            1
3       [100, 100]         200            0

Answer 3

A vectorized approach for larger lists would be using DataFrame.explode and then GroupBy.sum较大列表的矢量化方法是使用DataFrame.explode然后是GroupBy.sum

x = df.explode('column_of_lists')
df['counts'] = x.column_of_lists.eq(x.scalar_col).groupby(x.index).sum()

Output Output

   column_of_lists  scalar_col  counts
0  [100, 200, 300]         100       1
1  [100, 200, 200]         200       2
2       [300, 500]         300       1
3       [100, 100]         200       0

如何计算列表列中列值的出现次数？

问题描述

3 个解决方案

解决方案1
6 已采纳 2023-01-12 12:34:49

解决方案2
1 2023-01-12 12:36:04

解决方案3
1 2023-01-12 12:46:12

如何计算列表列中列值的出现次数？

问题描述

3 个解决方案

解决方案1 6 已采纳 2023-01-12 12:34:49

解决方案2 1 2023-01-12 12:36:04

解决方案3 1 2023-01-12 12:46:12

解决方案1
6 已采纳 2023-01-12 12:34:49

解决方案2
1 2023-01-12 12:36:04

解决方案3
1 2023-01-12 12:46:12