[英]Series objects are mutable, thus they cannot be hashed on Python pandas dataframe
[英]when I set value in dataframe(pandas) there is error: 'Series' objects are mutable, thus they cannot be hashed
我想通過data [Bare Nuclei']!='?'來更改pandas DataFrame中的值。
import pandas as pd
import numpy as np
column_names = ['Sample code number', 'Clump Thickness',
'Uniformity of Cell Size', 'Uniformity of Cell Shape',
'Marginal Adhesion', 'Single Epithelial Cell Size',
'Bare Nuclei', 'Bland Chromatin', 'Normal Nucleoli',
'Mitoses', 'Class']
data = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data', names = column_names )
mean = 0
n = 0
for index,row in data.iterrows():
if row['Bare Nuclei'] != '?':
n += 1
mean += int(row['Bare Nuclei'])
mean = mean / n
temp = data
index = temp['Bare Nuclei'] == '?'
temp[index,'Bare Nuclei'] = mean
我想知道如何更改數據框的值以及為什么我的方法是錯誤的? 您能幫我嗎,我期待您的幫助!
最后一行添加DataFrame.loc
,因為需要更改DataFrame
列:
temp.loc[index,'Bare Nuclei'] = mean
但是在大熊貓中最好避免循環,因為它很慢。 那么更好的解決方案是replace
?
到NaN
,然后以mean
s fillna
:
data['Bare Nuclei'] = data['Bare Nuclei'].replace('?', np.nan).astype(float)
#more general
#data['Bare Nuclei'] = pd.to_numeric(data['Bare Nuclei'], errors='coerce')
data['Bare Nuclei'] = data['Bare Nuclei'].fillna(data['Bare Nuclei'].mean())
替代解決方案:
mask = data['Bare Nuclei'] == '?'
data['Bare Nuclei'] = data['Bare Nuclei'].mask(mask).astype(float)
data['Bare Nuclei'] = data['Bare Nuclei'].fillna(data['Bare Nuclei'].mean())
驗證解決方案:
column_names = ['Sample code number', 'Clump Thickness',
'Uniformity of Cell Size', 'Uniformity of Cell Shape',
'Marginal Adhesion', 'Single Epithelial Cell Size',
'Bare Nuclei', 'Bland Chromatin', 'Normal Nucleoli',
'Mitoses', 'Class']
data = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data', names = column_names )
#print (data.head())
#get index values by condition
L = data.index[data['Bare Nuclei'] == '?'].tolist()
print (L)
[23, 40, 139, 145, 158, 164, 235, 249, 275, 292, 294, 297, 315, 321, 411, 617]
#get mean of values converted to numeric
print (data['Bare Nuclei'].replace('?', np.nan).astype(float).mean())
3.5446559297218156
print (data.loc[L, 'Bare Nuclei'])
23 ?
40 ?
139 ?
145 ?
158 ?
164 ?
235 ?
249 ?
275 ?
292 ?
294 ?
297 ?
315 ?
321 ?
411 ?
617 ?
Name: Bare Nuclei, dtype: object
#convert to numeric - replace `?` to NaN and cast to float
data['Bare Nuclei'] = data['Bare Nuclei'].replace('?', np.nan).astype(float)
#more general
#data['Bare Nuclei'] = pd.to_numeric(data['Bare Nuclei'], errors='coerce')
#replace NaNs by means
data['Bare Nuclei'] = data['Bare Nuclei'].fillna(data['Bare Nuclei'].mean())
#verify replacing
print (data.loc[L, 'Bare Nuclei'])
23 3.544656
40 3.544656
139 3.544656
145 3.544656
158 3.544656
164 3.544656
235 3.544656
249 3.544656
275 3.544656
292 3.544656
294 3.544656
297 3.544656
315 3.544656
321 3.544656
411 3.544656
617 3.544656
Name: Bare Nuclei, dtype: float64
temp [index,'Bare Nuclei']是布爾索引和使用標簽的列選擇的混合,這將不起作用。 相反,改變
index = temp['Bare Nuclei'] == '?'
temp[index,'Bare Nuclei'] = mean
至
s=temp['Bare Nuclei']
temp['Bare Nuclei']=s.where(s!='?',mean)
where(s!='?',mean)實際上意味着將條件s!='?的元素的值更改為'mean'。 不滿足(乍一看有點困惑)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.