[英]AttributeError: 'DataFrame' object has no attribute 'tolist'
When I run this code in Jupyter Notebook: 当我在Jupyter Notebook中运行以下代码时:
columns = ['nkill', 'nkillus', 'nkillter','nwound', 'nwoundus', 'nwoundte', 'propvalue', 'nperps', 'nperpcap', 'iyear', 'imonth', 'iday']
for col in columns:
# needed for any missing values set to '-99'
df[col] = [np.nan if (x < 0) else x for x in
df[col].tolist()]
# calculate the mean of the column
column_temp = [0 if math.isnan(x) else x for x in df[col].tolist()]
mean = round(np.mean(column_temp))
# then apply the mean to all NaNs
df[col].fillna(mean, inplace=True)
I receive the following error: 我收到以下错误:
AttributeError Traceback
(most recent call last)
<ipython-input-56-f8a0a0f314e6> in <module>()
3 for col in columns:
4 # needed for any missing values set to '-99'
----> 5 df[col] = [np.nan if (x < 0) else x for x in df[col].tolist()]
6 # calculate the mean of the column
7 column_temp = [0 if math.isnan(x) else x for x in df[col].tolist()]
/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py in __getattr__(self, name)
4374 if self._info_axis._can_hold_identifiers_and_holds_name(name):
4375 return self[name]
-> 4376 return object.__getattribute__(self, name)
4377
4378 def __setattr__(self, name, value):
AttributeError: 'DataFrame' object has no attribute 'tolist'
The code works fine when I run it in Pycharm, and all of my research has led me to conclude that it should be fine. 当我在Pycharm中运行该代码时,它可以正常工作,而我的所有研究都使我得出结论,认为它应该很好。 Am I missing something? 我想念什么吗?
I've created a Minimal, Complete, and Verifiable example below: 我在下面创建了一个最小,完整和可验证的示例:
import numpy as np
import pandas as pd
import os
import math
# get the path to the current working directory
cwd = os.getcwd()
# then add the name of the Excel file, including its extension to get its relative path
# Note: make sure the Excel file is stored inside the cwd
file_path = cwd + "/data.xlsx"
# Copy the database to file
df = pd.read_excel(file_path)
columns = ['nkill', 'nkillus', 'nkillter', 'nwound', 'nwoundus', 'nwoundte', 'propvalue', 'nperps', 'nperpcap', 'iyear', 'imonth', 'iday']
for col in columns:
# needed for any missing values set to '-99'
df[col] = [np.nan if (x < 0) else x for x in df[col].tolist()]
# calculate the mean of the column
column_temp = [0 if math.isnan(x) else x for x in df[col].tolist()]
mean = round(np.mean(column_temp))
# then apply the mean to all NaNs
df[col].fillna(mean, inplace=True)
You have an XY Problem . 您有XY问题 。 You've described what you are trying to achieve in your comments, but your approach is not appropriate for Pandas. 您已经在评论中描述了您要实现的目标,但是您的方法不适用于熊猫。
for
loops and list
避免for
循环和list
With Pandas, you should look to avoid explicit for
loops or conversion to Python list
. 使用Pandas时,您应该避免显式的for
循环或转换为Python list
。 Pandas builds on NumPy arrays which support vectorised column-wise operations. Pandas基于NumPy数组构建,该数组支持矢量化列式操作。
So let's look at how you can rewrite: 因此,让我们看一下如何重写:
for col in columns:
# values less than 0 set to NaN
# calculate the mean of the column with 0 for NaN
# then apply the mean to all NaNs
You can now use Pandas methods to achieve the above. 现在,您可以使用Pandas方法来实现上述目的。
apply
+ pd.to_numeric
+ mask
+ fillna
apply
+ pd.to_numeric
+ mask
+ fillna
You can define a function mean_update
and use pd.DataFrame.apply
to apply it to each series: 您可以定义一个函数mean_update
并使用pd.DataFrame.apply
将其应用于每个系列:
df = pd.DataFrame({'A': [1, -2, 3, np.nan],
'B': ['hello', 4, 5, np.nan],
'C': [-1.5, 3, np.nan, np.nan]})
def mean_update(s):
s_num = pd.to_numeric(s, errors='coerce') # convert to numeric
s_num = s_num.mask(s_num < 0) # replace values less than 0 with NaN
s_mean = s_num.fillna(0).mean() # calculate mean
return s_num.fillna(s_mean) # replace NaN with mean
df = df.apply(mean_update) # apply to each series
print(df)
A B C
0 1.0 2.25 0.75
1 1.0 4.00 3.00
2 3.0 5.00 0.75
3 1.0 2.25 0.75
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.