简体   繁体   English

使用pandas / python在数据框中出现列值小于某个数字时删除行?

[英]Remove rows when the occurrence of a column value in the data frame is less than a certain number using pandas/python?

I have a data frame like this: 我有一个这样的数据框:

df
col1    col2
A         1
B         1
C         2
D         3
D         2
B         1
D         5

I have seen that col1 values with B and D occurs more than one times in the data frame. 我已经看到,在数据框中,带有B和D的col1值出现的次数超过一次。

I want to keep those values with occurrence more than one, the final data frame will look like: 我希望将这些值保留为多于一个,最终的数据框将如下所示:

col1     col2
 B         1
 D         3
 D         2
 B         1
 D         5

How to do this in most efficient way using pandas/python ? 如何使用pandas / python以最有效的方式执行此操作?

You can use duplicated setting keep=False , which will return True for all duplicate values in col1 , and then simply use boolean indexation on the dataframe: 您可以使用duplicated设置keep=False ,它将为col1所有重复值返回True ,然后在数据帧上使用boolean indexation:

df[df.col1.duplicated(keep=False)]

   col1  col2
1    B     1
3    D     3
4    D     2
5    B     1
6    D     5

Update 更新

For keeping values where col1 occures more than thr times, use: 要保持col1发生的次数超过thr次数,请使用:

thr = 2
df[df.col1.duplicated(keep=False).groupby(df.col1).transform('sum').gt(thr)]

   col1  col2
3    D     3
4    D     2
6    D     5

Use DataFrame.duplicated with specify column col1 for search dupes with keep=False for return True s for all dupe rows, last filter by boolean indexing : 使用DataFrame.duplicated ,为搜索对象指定列col1 ,使用keep=False ,返回所有dupe行的True ,最后按boolean indexing过滤:

df = df[df.duplicated('col1', keep=False)]
print (df)
  col1  col2
1    B     1
3    D     3
4    D     2
5    B     1
6    D     5

If need specify threshold use transform with size and filter same way like first solution: 如果需要指定阈值使用transform size和过滤器相同的方式像第一个解决方案:

df = df[df.groupby('col1')['col1'].transform('size') > 1]
print (df)
  col1  col2
1    B     1
3    D     3
4    D     2
5    B     1
6    D     5

Alternative solution with value_counts and map : 使用value_countsmap替代解决方案:

df = df[df['col1'].map(df['col1'].value_counts()) > 1]

If performance is not important use DataFrameGroupBy.filter : 如果性能不重要,请使用DataFrameGroupBy.filter

df = df.groupby('col1').filter(lambda x: len(x) > 1)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 去除 python 中小于某个值的行 - Remove rows in python less than a certain value 当连续行的差异小于一个值时,对 pandas 数据帧中的行进行分组 - group rows in a pandas data frame when the difference of consecutive rows are less than a value 当字数小于 N 时,删除 pandas 数据帧中的字符串行 - remove String row in pandas data frame when number of words is less than N 当连续列值小于某个数字时,pandas 逐行求和 - pandas row wise sum when when consecutive column value is less than a certain number 当特定列中的值小于先前值时,从数据框中删除行 - Removing rows from a data frame when the value In a specific column is less than the previous value 如何使用 pandas 根据某个列中的值合并/划分数据框中的行? - How to consolidate/divide rows within a data frame based on a value within a certain column using pandas? 熊猫数据框选择小于一列内容浮点值的所有行 - Pandas data frame select all rows less than a column content float values 删除 pandas 数据帧的列值不连续出现的行 - Delete the rows with no continuous occurrence of a column values of a pandas data frame 使用python pandas在特定列中使用最大值在数据框中添加新的“x”列数 - Add new 'x' number of columns in data frame using max value in specific column using python pandas 过滤一组中超过 1 个值的行并计算其出现次数 pandas python - Filter rows with more than 1 value in a set and count their occurrence pandas python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM