[英]“Can't convert float Nan to int” but no Nan?
我有一個數據框,並嘗試進行以下操作:
data['SD_rates']=np.array([int((data['actual value'][i]-data['means'][i])/data['std'][i]) for i in range (len(data['means']))])
它以以下消息中斷:“無法將float Nan轉換為int”
我理解這是一個錯誤,但使用data.isnull()測試了df,並且所涉及的列均不包含NaN(我通過發送data.to_csv進行了手動控制)。
我什至用fillna(-1,inplace = True)填充了data ['std'],但它仍然中斷了。 我不明白為什么,因為沒有被0除(我還控制了此列中沒有零,所以沒有原始的0和Null / Nan用-1填充),而實際值和均值為fillna(0 )的缺失值,無論如何減法都不能產生nan([0-10]中的數據范圍)。
有什么事嗎 (如我所說,觸發操作之前的數據是正確的...)。 謝謝
這是一個代碼片段:
我的假設之一是,groupby可能會生成NaN,在計算均值時我不會擺脫(但我相信熊貓會自動忽略它……)並且未填充0或-1(我故意為標准偏差選擇-1以避免除以0)。
def stats_setting(data):
print('Stats settings')
print(data.columns)
print(data.dtypes)
#sys.exit()
data['marks']=np.log1p(data['marks'].astype(float))
data['students']=np.log1p(data['students'].astype(float))#Rossman9 think this has to be tested
#were filled with fillna before)
#First Part: by studentType and Assortment
types_DoM_select=['Type','Type2','Category']
#First Block:types_DoM students grouped by categories
#wonder if can do a groupby of groupb
print("types_DoM_marks_means")
types_DoM_marks_means = data.groupby(types_DoM_select)['marks'].mean()
types_DoM_marks_means.name = 'types_DoM_marks_means'
types_DoM_marks_means = types_DoM_marks_means.reset_index()
data = pd.merge(data, types_DoM_marks_means, on = types_DoM_select, how='left')
print("types_DoM_students_means")
types_DoM_students_means = data.groupby(types_DoM_select)['students'].mean() #.students won't work. Why?
types_DoM_students_means.name = 'types_DoM_students_means'
types_DoM_students_means=types_DoM_students_means.reset_index()
data = pd.merge(data, types_DoM_students_means, on = types_DoM_select, how='left')
print("types_DoM_marks_medians")
types_DoM_marks_medians = data.groupby(types_DoM_select)['marks'].median()
types_DoM_marks_medians.name = 'types_DoM_marks_medians'
types_DoM_marks_medians = types_DoM_marks_medians.reset_index()
data = pd.merge(data, types_DoM_marks_medians, on = types_DoM_select, how='left')
print("types_DoM_students_medians")
types_DoM_students_medians = data.groupby(types_DoM_select)['students'].median() #.students won't work. Why?
types_DoM_students_medians.name = 'types_DoM_students_medians'
types_DoM_students_medians=types_DoM_students_medians.reset_index()
data = pd.merge(data, types_DoM_students_medians, on = types_DoM_select, how='left')
print("types_DoM_marks_std")
types_DoM_marks_std = data.groupby(types_DoM_select)['marks'].std()
types_DoM_marks_std.name = 'types_DoM_marks_std'
types_DoM_marks_std = types_DoM_marks_std.reset_index()
data = pd.merge(data, types_DoM_marks_std, on = types_DoM_select, how='left')
print("types_DoM_students_std")
types_DoM_students_std = data.groupby(types_DoM_select)['students'].std()
types_DoM_students_std.name = 'types_DoM_students_std'
types_DoM_students_std = types_DoM_students_std.reset_index()
data = pd.merge(data, types_DoM_students_std, on = types_DoM_select, how='left')
data['types_DoM_marks_means'].fillna(-1, inplace=True)
data['types_DoM_students_means'].fillna(-1, inplace=True)
data['types_DoM_marks_medians'].fillna(-1, inplace=True)
data['types_DoM_students_medians'].fillna(-1, inplace=True)
data['types_DoM_marks_std'].fillna(-1, inplace=True)
data['types_DoM_students_std'].fillna(-1, inplace=True)
#Second Part: by specific student
student_DoM_select=['Type','Type2','Category']
#First Block:student_DoM
#wonder if can do a groupby of groupb
print("student_DoM_marks_means")
student_DoM_marks_means = data.groupby(student_DoM_select)['marks'].mean()
student_DoM_marks_means.name = 'student_DoM_marks_means'
student_DoM_marks_means = student_DoM_marks_means.reset_index()
data = pd.merge(data, student_DoM_marks_means, on = student_DoM_select, how='left')
print("student_DoM_students_means")
student_DoM_students_means = data.groupby(student_DoM_select)['students'].mean() #.students won't work. Why?
student_DoM_students_means.name = 'student_DoM_students_means'
student_DoM_students_means=student_DoM_students_means.reset_index()
data = pd.merge(data, student_DoM_students_means, on = student_DoM_select, how='left')
print("student_DoM_marks_medians")
student_DoM_marks_medians = data.groupby(student_DoM_select)['marks'].median()
student_DoM_marks_medians.name = 'student_DoM_marks_medians'
student_DoM_marks_medians = student_DoM_marks_medians.reset_index()
data = pd.merge(data, student_DoM_marks_medians, on = student_DoM_select, how='left')
print("student_DoM_students_medians")
student_DoM_students_medians = data.groupby(student_DoM_select)['students'].median() #.students won't work. Why?
student_DoM_students_medians.name = 'student_DoM_students_medians'
student_DoM_students_medians=student_DoM_students_medians.reset_index()
data = pd.merge(data, student_DoM_students_medians, on = student_DoM_select, how='left')
# May I use data['marks','students','marksMean','studentsMean','marksMedian','studentsMedian']=data['marks','students','marksMean','studentsMean','marksMedian','studentsMedian'].astype(int) to spare memory?
print("student_DoM_marks_std")
student_DoM_marks_std = data.groupby(student_DoM_select)['marks'].std()
student_DoM_marks_std.name = 'student_DoM_marks_std'
student_DoM_marks_std = student_DoM_marks_std.reset_index()
data = pd.merge(data, student_DoM_marks_std, on = student_DoM_select, how='left')
print("student_DoM_students_std")
student_DoM_students_std = data.groupby(student_DoM_select)['students'].std()
student_DoM_students_std.name = 'student_DoM_students_std'
student_DoM_students_std = student_DoM_students_std.reset_index()
data = pd.merge(data, student_DoM_students_std, on = student_DoM_select, how='left')
data['student_DoM_marks_means'].fillna(0, inplace=True)
data['student_DoM_students_means'].fillna(0, inplace=True)
data['student_DoM_marks_medians'].fillna(0, inplace=True)
data['student_DoM_students_medians'].fillna(0, inplace=True)
data['student_DoM_marks_std'].fillna(0, inplace=True)
data['student_DoM_students_std'].fillna(0, inplace=True)
#Third Part: Exceptional students
#I think int is better here as it helps defining categories but can't use it.#
#print(data.isnull().sum())
#print(data['types_DoM_marks_std'][data['types_DoM_marks_std']==0].sum())
#data.to_csv('ex')
#print(data.columns)
#Original version:#int raises the "can't convert Nan float to int. While there were no Nan as I verified in the data just before sending it to the
data['Except_student_IP2_DoM_marks_means']=np.array([int((data['student_IP2_DoM_marks_means'][i]-data['types_IP2_DoM_marks_means'][i])/data['types_IP2_DoM_students_std'][i]) for i in range (len(data['year']))])
data['Except_student_IP2_DoM_marks_medians']=np.array([int((data['student_IP2_DoM_marks_medians'][i]-data['types_IP2_DoM_marks_means'][i])/data['types_IP2_DoM_students_std'][i]) for i in range (len(data['year']))])
#Second version: raises no error but final data (returned) is filled with these stupid NaN
data['Except_student_P2M_DoM_marks_means']=np.array([np.round((data['student_DoM_marks_means'][i]-data['types_DoM_marks_means'][i])/data['types_DoM_marks_std'][i],0) for i in range (len(data['year']))])
data['Except_student_P2M_DoM_marks_medians']=np.array([np.round((data['student_DoM_marks_medians'][i]-data['types_DoM_marks_medians'][i])/data['types_DoM_marks_std'][i],0) for i in range (len(data['year']))])
#End
return data
您很可能是正確的,即數據框中沒有Nan,但是您正在計算中創建它們。 請參閱以下內容:
In [15]: import pandas as pd
In [16]: df = pd.DataFrame([[1, 2], [0, 0]], columns=['actual value', 'col2'])
df['means'] = df.mean(axis=1)
df['std'] = df.std(axis=1)
In [17]: df
Out[17]:
actual value col2 means std
0 1 2 1.5 0.5
1 0 0 0.0 0.0
因此,數據框沒有任何Nans,但是計算呢?
In [21]: [(df['actual value'][i]-df['means'][i])/df['std'][i] for i in range (len(df['means']))]
Out[21]: [-1.0, nan]
現在,當您對int
進行調用時,您會在結果列表中得到一個錯誤。 最后,我建議(如果可能的話)直接在基礎數組中執行操作,而不要使用for循環,因為這樣會更快。
In [25]: (df['actual value']-df['means'])/df['std']
Out[25]:
0 -1
1 NaN
dtype: float64
但是,這可能是不可能的,具體取決於所需的0除法返回值。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.