[英]dropping dataframe rows based on multiple conditions
I am trying to drop some rows from a pandas DataFrame based on 4 conditions needing to be met in the same row. 我试图根据需要在同一行中满足4个条件从pandas DataFrame中删除一些行。 So I tried the following command: 所以我尝试了以下命令:
my_data.drop(my_data[(my_data.column1 is None) & (my_data.column2 is None) & (my_data.column3 is None) & (my_data.column4 is None)].index, inplace=True)
And it throws this error: enter image description here 并引发此错误: 在此处输入图像描述
I've also tried: 我也尝试过:
my_data= my_data.loc[my_data[(my_data.column1 is None) & (my_data.column2 is None) & (my_data.column3 is None) & (my_data.column4 is None), :]
but without success 但没有成功
Can i get some help please :) 我可以帮忙吗:)
I'm working on python 3.5 我正在使用python 3.5
Normally, a column is checked for nullness with the isnull
method: 通常,使用isnull
方法检查一列是否为空:
df.drop(df[df['column1'].isnull()
& df['column2'].isnull()
& df['column3'].isnull()
& df['column4'].isnull()].index)
However, there are more compact and idiomatic ways for that: 但是,有更紧凑和惯用的方式可以做到这一点:
df.dropna(subset=['column1', 'column2', 'column3', 'column4'], how='all')
A demo: 演示:
prng = np.random.RandomState(0)
df = pd.DataFrame(prng.randn(100, 6), columns=['column{}'.format(i) for i in range(1, 7)])
df.head()
Out:
column1 column2 column3 column4 column5 column6
0 1.764052 0.400157 0.978738 2.240893 1.867558 -0.977278
1 0.950088 -0.151357 -0.103219 0.410599 0.144044 1.454274
2 0.761038 0.121675 0.443863 0.333674 1.494079 -0.205158
3 0.313068 -0.854096 -2.552990 0.653619 0.864436 -0.742165
4 2.269755 -1.454366 0.045759 -0.187184 1.532779 1.469359
df = df.mask(prng.binomial(1, 0.5, df.shape).astype('bool'), np.nan)
df.head()
Out:
column1 column2 column3 column4 column5 column6
0 NaN 0.400157 NaN 2.240893 NaN NaN
1 0.950088 -0.151357 -0.103219 0.410599 0.144044 NaN
2 0.761038 0.121675 NaN NaN NaN -0.205158
3 NaN NaN -2.552990 NaN 0.864436 NaN
4 2.269755 -1.454366 0.045759 -0.187184 NaN NaN
The following drops rows only if columns 1, 3, 5 and 6 are null: 仅当第1、3、5和6列为空时,以下才会删除行:
df.dropna(subset=['column1', 'column3', 'column5', 'column6'], how='all').head()
Out:
column1 column2 column3 column4 column5 column6
1 0.950088 -0.151357 -0.103219 0.410599 0.144044 NaN
2 0.761038 0.121675 NaN NaN NaN -0.205158
3 NaN NaN -2.552990 NaN 0.864436 NaN
4 2.269755 -1.454366 0.045759 -0.187184 NaN NaN
5 0.154947 0.378163 -0.887786 -1.980796 -0.347912 NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.