[英]Drop rows in pandas conditionally based on values across all columns
I have a count matrix which represents % abundance, with samples as columns and observations as rows eg: 我有一个表示%丰度的计数矩阵,其中样本作为列,观察值作为行,例如:
#OTUId 101.BGd_295 103.BGd_309 105.BGd_310 11.BGd_99 123.BGd_312
OTU_200 0.016806723 0.23862789 0.148210883 0.6783 0.126310471
OTU_54 0.253542133 0.169383866 0 0.113679432 0.173943294
OTU_2 0.033613445 16.58463833 19.66970146 16.06669119 20.92537833
I am trying to filter the dataframe using pandas, keeping only those rows which have at least one value above 0.5%. 我正在尝试使用熊猫过滤数据框,仅保留那些至少有一个大于0.5%的值的行。 I initially found this
我最初发现这个
df = df[(df > 0.5).sum(axis=1) >= 1]
which i thought would do the trick but now as far as I understand this will instead keep those in which the sum across the row is greater than 0.5. 我以为可以做到这一点,但据我所知,现在将保留行中总和大于0.5的那些。 How can I modify this to suit?
我该如何修改以适合?
thanks! 谢谢!
I think simplier solution is use condition for boolean DataFrame and then check by any
for at least one True
per row, last filter by boolean indexing
: 我认为更简单的解决方案是对布尔DataFrame使用条件,然后按
any
条件检查每行至少一个True
,最后通过boolean indexing
过滤:
print (df.drop('#OTUId',axis=1) > 0.5)
101.BGd_295 103.BGd_309 105.BGd_310 11.BGd_99 123.BGd_312
0 False False False True False
1 False False False False False
2 False True True True True
print ((df.drop('#OTUId',axis=1) > 0.5).any(axis=1))
0 True
1 False
2 True
dtype: bool
df = df[(df.drop('#OTUId',axis=1) > 0.5).any(axis=1)]
print (df)
#OTUId 101.BGd_295 103.BGd_309 105.BGd_310 11.BGd_99 123.BGd_312
0 OTU_200 0.016807 0.238628 0.148211 0.678300 0.126310
2 OTU_2 0.033613 16.584638 19.669701 16.066691 20.925378
Your code: 您的代码:
df = df[(df > 0.5).sum(axis=1) >= 1]
#boolean mask
print (df > 0.5)
#OTUId 101.BGd_295 103.BGd_309 105.BGd_310 11.BGd_99 123.BGd_312
0 True False False False True False
1 True False False False False False
2 True False True True True True
#count True values per row
print ((df > 0.5).sum(axis=1))
0 2
1 1
2 5
dtype: int64
#check values by condition
print ((df > 0.5).sum(axis=1) >= 1)
0 True
1 True
2 True
dtype: bool
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.