[英]Selecting rows from Pandas Dataframe with same values in one column that have only missing in another
In the following code, under column A, foo and tog have only missing values in column B. However, I can't simply use is_na()
to filter all missing values, since there is one bar that has a missing value.在下面的代码中,在 A 列下,foo 和 tog 在 B 列中只有缺失值。但是,我不能简单地使用
is_na()
过滤所有缺失值,因为有一个 bar 具有缺失值。
df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
'tog', 'bar', 'bar'],
'B' : [np.nan, 2, np.nan, 4, np.nan, 6, np.nan],
'C' : [2.0, 5., 8., 1., 2., 9., 3.]})
I've tried with df.groupby('A').filter(df['B'] == 'NaN')
, but that returns an error:我试过
df.groupby('A').filter(df['B'] == 'NaN')
,但返回错误:
'Series' object is not callable.
'系列' object 不可调用。
How can I filter or select for foo and tog?如何为 foo 和 tog 过滤或 select? Much appreciated!
非常感激!
Edit: I'm cleaning a dataset that has a few missing values, but spread out amongst a lot of rows.编辑:我正在清理一个包含一些缺失值但分布在很多行中的数据集。 As such, I can't just simply select for named elements corresponding with column A (eg foo and tog).
因此,对于与 A 列对应的命名元素(例如 foo 和 tog),我不能简单地使用 select。
In other words, I need the following换句话说,我需要以下
A B C
1 bar 2.0 5.0
3 bar 4.0 1.0
5 bar 6.0 9.0
6 bar NaN 3.0
filter
expects a function and you can pass one that checks if not all of the values in B
are NaN
: filter
需要一个 function 并且您可以传递一个检查B
中是否并非所有值都是NaN
的值:
df.groupby("A").filter(lambda x: ~x.B.isna().all())
to get要得到
A B C
1 bar 2.0 5.0
3 bar 4.0 1.0
5 bar 6.0 9.0
6 bar NaN 3.0
where foo
and tog
are filtered out since they have all NaN's in B column.其中
foo
和tog
被过滤掉,因为它们在 B 列中具有所有 NaN。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.