简体   繁体   English

根据所有列中的值有条件地在熊猫中删除行

[英]Drop rows in pandas conditionally based on values across all columns

I have a count matrix which represents % abundance, with samples as columns and observations as rows eg: 我有一个表示%丰度的计数矩阵,其中样本作为列,观察值作为行,例如:

#OTUId  101.BGd_295  103.BGd_309  105.BGd_310  11.BGd_99   123.BGd_312  
OTU_200 0.016806723  0.23862789   0.148210883  0.6783      0.126310471  
OTU_54  0.253542133  0.169383866  0            0.113679432 0.173943294
OTU_2   0.033613445  16.58463833  19.66970146  16.06669119 20.92537833  

I am trying to filter the dataframe using pandas, keeping only those rows which have at least one value above 0.5%. 我正在尝试使用熊猫过滤数据框,仅保留那些至少有一个大于0.5%的值的行。 I initially found this 我最初发现这个

df = df[(df > 0.5).sum(axis=1) >= 1]

which i thought would do the trick but now as far as I understand this will instead keep those in which the sum across the row is greater than 0.5. 我以为可以做到这一点,但据我所知,现在将保留行中总和大于0.5的那些。 How can I modify this to suit? 我该如何修改以适合?

thanks! 谢谢!

I think simplier solution is use condition for boolean DataFrame and then check by any for at least one True per row, last filter by boolean indexing : 我认为更简单的解决方案是对布尔DataFrame使用条件,然后按any条件检查每行至少一个True ,最后通过boolean indexing过滤:

print (df.drop('#OTUId',axis=1) > 0.5)
   101.BGd_295  103.BGd_309  105.BGd_310  11.BGd_99  123.BGd_312
0        False        False        False       True        False
1        False        False        False      False        False
2        False         True         True       True         True

print ((df.drop('#OTUId',axis=1) > 0.5).any(axis=1))
0     True
1    False
2     True
dtype: bool

df = df[(df.drop('#OTUId',axis=1) > 0.5).any(axis=1)]
print (df)
    #OTUId  101.BGd_295  103.BGd_309  105.BGd_310  11.BGd_99  123.BGd_312
0  OTU_200     0.016807     0.238628     0.148211   0.678300     0.126310
2    OTU_2     0.033613    16.584638    19.669701  16.066691    20.925378

Your code: 您的代码:

df = df[(df > 0.5).sum(axis=1) >= 1]

#boolean mask
print (df > 0.5)
   #OTUId  101.BGd_295  103.BGd_309  105.BGd_310  11.BGd_99  123.BGd_312
0    True        False        False        False       True        False
1    True        False        False        False      False        False
2    True        False         True         True       True         True

#count True values per row
print ((df > 0.5).sum(axis=1))
0    2
1    1
2    5
dtype: int64

#check values by condition
print ((df > 0.5).sum(axis=1) >= 1)
0    True
1    True
2    True
dtype: bool

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM