[英]How to delete rows from a dataframe if specific columns contains null values?
I need to delete rows from a dataframe if specific columns contains null values:如果特定列包含 null 值,我需要从 dataframe 中删除行:
-> In this example, if col2 and col3 are null: -> 在这个例子中,如果 col2 和 col3 是 null:
import pandas as pd
obj = {'col1': [1, 2,7,47,12,67,58], 'col2': [741, 332,7,'Nan',127,'Nan',548], 'col3': ['Nan', 2,74,'Nan',127,'Nan',548] }
df = pd.DataFrame(data=obj)
df.head()
col1 col2 col3
0 1 741 Nan
1 2 332 2
2 7 7 74
3 47 Nan Nan
4 12 127 127
5 67 Nan Nan
6 58 548 548
After delete, the result should be:删除后,结果应该是:
df.head()
col1 col2 col3
0 1 741 Nan
1 2 332 2
2 7 7 74
4 12 127 127
6 58 548 548
Thanks for all!谢谢大家!
Use Boolean indexing
with DataFrame.isna or DataFrame.isnull
to check NaN or Null values.使用Boolean indexing
和DataFrame.isna或DataFrame.isnull
值来检查 NaN 或 ZBBB93CDD216E3C18014Z10B1 Select the maximum number of NaN
allowed per rows with DataFrame.sum
and Series.le
: Select DataFrame.sum
和Series.le
每行允许的最大NaN
数:
df=df.replace('Nan',np.nan)
new_df=df[df.isnull().sum(axis=1).le(1)]
print(new_df)
col1 col2 col3
0 1 741.0 NaN
1 2 332.0 2.0
2 7 7.0 74.0
4 12 127.0 127.0
6 58 548.0 548.0
To specifict columns:要指定列:
df=df.replace('Nan',np.nan)
df_filtered=df[~df[['col2','col3']].isnull().all(axis=1)]
print(df_filtered)
col1 col2 col3
0 1 741.0 NaN
1 2 332.0 2.0
2 7 7.0 74.0
4 12 127.0 127.0
6 58 548.0 548.0
axis = 0
to delete rows, thresh=1
has the number of non-null values required to drop the row. axis = 0
删除行, thresh=1
具有删除行所需的非空值的数量。
You can use subset=['col2', 'col3']
if you want to define the columns on which the as the basis of dropping rows.如果要定义作为删除行基础的列,可以使用subset=['col2', 'col3']
。
You can try this:你可以试试这个:
df = df.dropna(axis=0, subset=['col2', 'col3'], how="any", thresh=1)
After deploying the solution proposed by @ansev, everything worked:部署@ansev 提出的解决方案后,一切正常:
import pandas as pd
obj = {'col1': [1, 2,7,47,12,67,58], 'col2': [741, 332,7,'Nan',127,'Nan',548], 'col3': ['Nan', 2,74,'Nan',127,'Nan',548] }
df = pd.DataFrame(data=obj)
df=df.replace('Nan',np.nan)
df_filtered=df[~df[['col2','col3']].isnull().all(axis=1)]
print(df_filtered)
col1 col2 col3
0 1 741.0 NaN
1 2 332.0 2.0
2 7 7.0 74.0
4 12 127.0 127.0
6 58 548.0 548.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.