查找每行的最常见值由列表组成

Question

I have pd.DataFrame in which one column contains lists as the values.我有pd.DataFrame ，其中一列包含lists作为值。 I want to create another column which consist only the most common value from that column.我想创建另一列，其中仅包含该列中最常见的值。 Example dataframe:示例 dataframe：

    col_1
0   [1, 2, 3, 3]
1   [2, 2, 8, 8, 7]
2   [3, 4]

And the expected dataframe is而预期的 dataframe 是

    col_1           col_2
0   [1, 2, 3, 3]    [3]
1   [2, 2, 8, 8, 7] [2, 8]
2   [3, 4]          [3, 4]

I tried to do我试着做

from statistics import mode
df['col_1'].apply(lambda x: mode(x))

But it is showing the most common list in that column.但它显示了该列中最常见的列表。

I also tried to use pandas mode function directly on that column, it also did not help.我也尝试直接在该列上使用 pandas mode function，它也没有帮助。 Is there any way to find the most common value(s)?有没有办法找到最常见的值？

Answer 1

Or just use multimode from the statistics module.或者只使用统计模块中的multimode 。

df['col_2'] = df['col_1'].apply(lambda x: multimode(x))

              col1    col2
0     [1, 2, 3, 3]     [3]
1  [2, 2, 8, 8, 7]  [2, 8]
2           [3, 4]  [3, 4]

Answer 2

Use Series.mode - but it is slow:使用Series.mode - 但它很慢：

df['new'] = df['col_1'].apply(lambda x: pd.Series(x).mode().tolist()) 
print (df)
             col_1     new
0     [1, 2, 3, 3]     [3]
1  [2, 2, 8, 8, 7]  [2, 8]
2           [3, 4]  [3, 4]

Or use statistics.multimode if performance is important:或者如果性能很重要，请使用statistics.multimode ：

from statistics import multimode

df['col_2'] = df['col_1'].apply(multimode) 
print (df)
             col_1   col_2
0     [1, 2, 3, 3]     [3]
1  [2, 2, 8, 8, 7]  [2, 8]
2           [3, 4]  [3, 4]

Performance :性能：

#[3000 rows x 4 columns]
df = pd.concat([df] * 1000, ignore_index=True)

In [195]: %timeit (df['col_1'].explode().groupby(level=0).apply(lambda x: x.mode().tolist()))
537 ms ± 66.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [196]: %timeit df['col_1'].apply(lambda x: pd.Series(x).mode().tolist())
699 ms ± 77.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [197]: %timeit df['col_1'].apply(multimode)
13.5 ms ± 1.03 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

Answer 3

Using mode per group:每组使用mode ：

df['col_2'] = (df['col_1']
               .explode()
               .groupby(level=0)
               .apply(lambda x: x.mode().tolist())
              )

output: output：

             col_1   col_2
0     [1, 2, 3, 3]     [3]
1  [2, 2, 8, 8, 7]  [2, 8]
2           [3, 4]  [3, 4]

Answer 4

Try this..尝试这个..

from collections import Counter

col_1 = [[1, 2, 3, 3],[2, 2, 8, 8, 7],[3, 4]]
df = pd.DataFrame({'col_1':col_1})

def common(row):
    c = Counter(row)
    c = pd.Series(c)
    return c[c==max(c)].index.values

df['col_2'] = df.col_1.map(common)

df去向

     col_1            col_2
0    [1, 2, 3, 3]     [3]
1    [2, 2, 8, 8, 7]  [2, 8]
2    [3, 4]           [3, 4]

查找每行的最常见值由列表组成

问题描述

4 个解决方案

解决方案1
5 2022-10-11 10:01:57

解决方案2
4 2022-10-11 09:57:12

解决方案3
3 已采纳 2022-10-11 09:56:00

解决方案4
1 2022-10-11 10:25:34

查找每行的最常见值由列表组成

问题描述

4 个解决方案

解决方案1 5 2022-10-11 10:01:57

解决方案2 4 2022-10-11 09:57:12

解决方案3 3 已采纳 2022-10-11 09:56:00

解决方案4 1 2022-10-11 10:25:34

解决方案1
5 2022-10-11 10:01:57

解决方案2
4 2022-10-11 09:57:12

解决方案3
3 已采纳 2022-10-11 09:56:00

解决方案4
1 2022-10-11 10:25:34