[英]Pandas - dropping rows with missing data not working using .isnull(), notnull(), dropna()
This is really weird. 这真的很奇怪。 I have tried several ways of dropping rows with missing data from a pandas dataframe, but none of them seem to work. 我已经尝试了几种方法从pandas数据帧中删除丢失数据的行,但它们似乎都没有工作。 This is the code (I just uncomment one of the methods used - but these are the three that I used in different modifications - this is the latest): 这是代码(我只是取消注释使用的方法之一 - 但这些是我在不同的修改中使用的三个 - 这是最新的):
import pandas as pd
Test = pd.DataFrame({'A':[1,2,3,4,5],'B':[1,2,'NaN',4,5],'C':[1,2,3,'NaT',5]})
print(Test)
#Test = Test.ix[Test.C.notnull()]
#Test = Test.dropna()
Test = Test[~Test[Test.columns.values].isnull()]
print "And now"
print(Test)
But in all cases, all I get is this: 但在所有情况下,我得到的只是:
A B C
0 1 1 1
1 2 2 2
2 3 NaN 3
3 4 4 NaT
4 5 5 5
And now
A B C
0 1 1 1
1 2 2 2
2 3 NaN 3
3 4 4 NaT
4 5 5 5
Is there any mistake that I am making? 我有什么错误吗? or what is the problem? 或者问题是什么? Ideally, I would like to get this: 理想情况下,我想得到这个:
A B C
0 1 1 1
1 2 2 2
4 5 5 5
Your example DF has NaN
and NaT
as strings which .dropna
, .notnull
and co. 你的例子DF有NaN
和NaT
作为字符串.dropna
, .notnull
和co。 won't consider falsey, so given your example you can use... 不会考虑假,所以根据你的例子,你可以使用......
df[~df.isin(['NaN', 'NaT']).any(axis=1)]
Which gives you: 哪个给你:
A B C
0 1 1 1
1 2 2 2
4 5 5 5
If you had a DF such as (note of the use of np.nan
and np.datetime64('NaT')
instead of strings: 如果你有一个DF,比如(使用np.nan
和np.datetime64('NaT')
而不是字符串:
df = pd.DataFrame({'A':[1,2,3,4,5],'B':[1,2,np.nan,4,5],'C':[1,2,3,np.datetime64('NaT'),5]})
Then running df.dropna()
which give you: 然后运行df.dropna()
,它给你:
A B C
0 1 1.0 1
1 2 2.0 2
4 5 5.0 5
Note that column B
is now a float
instead of an integer as that's required to store NaN
values. 请注意,列B
现在是float
而不是整数,因为存储NaN
值是必需的。
Try this on orig data: 在orig数据上试试这个:
Test.replace(["NaN", 'NaT'], np.nan, inplace = True)
Test = Test.dropna()
Test
Or Modify data and do this 或修改数据并执行此操作
import pandas as pd
import numpy as np
Test = pd.DataFrame({'A':[1,2,3,4,5],'B':[1,2,np.nan,4,5],'C':[1,2,3,pd.NaT,5]})
print(Test)
Test = Test.dropna()
print(Test)
A B C
0 1 1.0 1
1 2 2.0 2
4 5 5.0 5
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.