[英]pandas dropna not working as expected on finding mean
When I run the code below I get the error: 当我运行下面的代码时,出现错误:
TypeError: 'NoneType' object has no attribute ' getitem ' TypeError:“ NoneType”对象没有属性“ getitem ”
import pyarrow
import pandas
import pyarrow.parquet as pq
df = pq.read_table("file.parquet").to_pandas()
df = df.iloc[1:,:]
df = df.dropna (how="any", inplace = True) # modifies it in place, creates new dataset without NAN
average_age = df["_c2"].mean()
print average_age
The dataframe looks like this: 数据框如下所示:
_c0 _c1 _c2
0 RecId Class Age
1 1 1st 29
2 2 1st NA
3 3 1st 30
If I print the df after calling the dropna method, I get 'None'. 如果在调用dropna方法后打印df,则会显示“无”。
Shouldn't it be creating a new dataframe without the 'NA' in it, which would then allow me to get the average age without throwing an error? 难道不是要创建一个没有“ NA”的新数据框,然后让我获得平均年龄而不会抛出错误吗?
As per OP's comment, the NA is a string rather than NaN. 根据OP的评论,NA是字符串而不是NaN。 So
dropna()
is no good here. 所以
dropna()
在这里不好。 One of many possible options for filtering out the string value 'NA' is: 过滤掉字符串值“ NA”的许多可能选项之一是:
df = df[df["_c2"] != "NA"]
A better option to catch inexact matches (eg with trailing spaces) as suggested by @DJK in the comments: @DJK在注释中建议的一种更好的选择来捕获不精确的匹配项(例如,尾随空格):
df = df[~df["_c2"].str.contains('NA')]
This one should remove any strings rather than only 'NA': 这应该删除所有字符串,而不只是“ NA”:
df = df[df[“_c2”].apply(lambda x: x.isnumeric())]
This will work, also if you the NA in your df is NaN (np.nan), this will not affect your getting the mean of the column, only if your NA is 'NA', which is string 即使您在df中的NA为NaN(np.nan),这也将起作用,仅当您的NA为'NA'时,这才不会影响获取列均值。
(df.apply(pd.to_numeric,errors ='coerce',axis=1)).describe()
Out[9]:
_c0 _c1 _c2
count 3.0 0.0 2.000000
mean 2.0 NaN 29.500000
std 1.0 NaN 0.707107
min 1.0 NaN 29.000000
25% 1.5 NaN 29.250000
50% 2.0 NaN 29.500000
75% 2.5 NaN 29.750000
max 3.0 NaN 30.000000
More info 更多信息
df.apply(pd.to_numeric,errors ='coerce',axis=1)# all object change to NaN and will not affect getting mean
Out[10]:
_c0 _c1 _c2
0 NaN NaN NaN
1 1.0 NaN 29.0
2 2.0 NaN NaN
3 3.0 NaN 30.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.