如何在 pandas 中的列的值计数条件下采用相同的 DataFrame？

Question

我有一个shape(1000,8)的 pandas DataFrame 所以我想制作新的 DataFrame 但在一列中有条件但不是一个简单的条件列，例如df.column1 = [1,2,2,2,3,3,4,5,8,8,8,8]我要具有相同的 DataFrame 具有相同的列，但具有列 1 的条件，我只想要行column1 的值重复超过 3 次，所以我得到： df.column1 = [8,8,8,8]

Answer 1

您可以使用value_counts并仅保留最常见的值

import pandas as pd
# define df
df = pd.DataFrame()
df['column1'] = [1,2,2,2,3,3,4,5,8,8,8,8] 

#get counts
counts = df['column1'].value_counts()

# keep only counts>3
counts = counts[counts>3]

# get the index to see which column1 values should be kept
to_keep = counts.index

# filter df with only correct values of column1
df.loc[df['column1'].isin(to_keep)]

#   column1
#8  8
#9  8
#10 8
#11 8

Answer 2

使用GroupBy.filter ：

这是一个例子

import pandas as pd
# define df
df = pd.DataFrame()
df['column1'] = [1,2,2,2,3,3,4,5,8,8,8,8]
df['column2']=range(0,len(df['column1']))

方法一

new_df=df.groupby('column1').filter(lambda x: x.column1.size>3)
print(new_df)
    column1  column2
8         8        8
9         8        9
10        8       10
11        8       11

方法二

或Groupby.transform以执行boolean indexing ：

new_df=df[df.groupby('column1').column1.transform('size')>3]
print(new_df)

8         8        8
9         8        9
10        8       10
11        8       11

方法三

最后，如果你想使用value_counts更好，如果你使用Series.map ：

new_df=df[df.column1.map(df.column1.value_counts())>3]
print(new_df)
    column1  column2
8         8        8
9         8        9
10        8       10
11        8       11

如何在 pandas 中的列的值计数条件下采用相同的 DataFrame？

问题描述

2 个解决方案

解决方案1
1 2019-11-19 00:40:04

解决方案2
0 已采纳 2019-11-19 00:45:48

如何在 pandas 中的列的值计数条件下采用相同的 DataFrame？

问题描述

2 个解决方案

解决方案1 1 2019-11-19 00:40:04

解决方案2 0 已采纳 2019-11-19 00:45:48

解决方案1
1 2019-11-19 00:40:04

解决方案2
0 已采纳 2019-11-19 00:45:48