简体   繁体   中英

when I set value in dataframe(pandas) there is error: 'Series' objects are mutable, thus they cannot be hashed

I want to change value in pandas DataFrame by condition that data[Bare Nuclei'] != '?'

import pandas as pd
import numpy as np
column_names = ['Sample code number', 'Clump Thickness', 
                'Uniformity of Cell Size', 'Uniformity of Cell Shape',
                'Marginal Adhesion', 'Single Epithelial Cell Size',
                'Bare Nuclei', 'Bland Chromatin', 'Normal Nucleoli',
                'Mitoses', 'Class']
data = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data', names = column_names )
mean = 0
n = 0
for index,row in data.iterrows():
    if row['Bare Nuclei'] != '?':
        n += 1
        mean += int(row['Bare Nuclei'])
mean = mean / n
temp = data
index = temp['Bare Nuclei'] == '?'
temp[index,'Bare Nuclei'] = mean

this is jupyter notebook give me error: 在此处输入图片说明

I want to know how to change value in dataframe and why my way is wrong? Could you help me, I look forward your help!!

For last line add DataFrame.loc , because need change column of DataFrame :

temp.loc[index,'Bare Nuclei'] = mean

But in pandas is the best avoid loops, because slow. So better solution is replace ? to NaN s and then fillna by mean s:

data['Bare Nuclei'] = data['Bare Nuclei'].replace('?', np.nan).astype(float)
#more general
#data['Bare Nuclei'] = pd.to_numeric(data['Bare Nuclei'], errors='coerce')
data['Bare Nuclei'] = data['Bare Nuclei'].fillna(data['Bare Nuclei'].mean())

Alternative solution:

mask = data['Bare Nuclei'] == '?'
data['Bare Nuclei'] = data['Bare Nuclei'].mask(mask).astype(float)
data['Bare Nuclei'] = data['Bare Nuclei'].fillna(data['Bare Nuclei'].mean())

Verify solution:

column_names = ['Sample code number', 'Clump Thickness', 
                'Uniformity of Cell Size', 'Uniformity of Cell Shape',
                'Marginal Adhesion', 'Single Epithelial Cell Size',
                'Bare Nuclei', 'Bland Chromatin', 'Normal Nucleoli',
                'Mitoses', 'Class']
data = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data', names = column_names )
#print (data.head())

#get index values by condition
L = data.index[data['Bare Nuclei'] == '?'].tolist()
print (L)
[23, 40, 139, 145, 158, 164, 235, 249, 275, 292, 294, 297, 315, 321, 411, 617]

#get mean of values converted to numeric
print (data['Bare Nuclei'].replace('?', np.nan).astype(float).mean())
3.5446559297218156

print (data.loc[L, 'Bare Nuclei'])
23     ?
40     ?
139    ?
145    ?
158    ?
164    ?
235    ?
249    ?
275    ?
292    ?
294    ?
297    ?
315    ?
321    ?
411    ?
617    ?
Name: Bare Nuclei, dtype: object

#convert to numeric - replace `?` to NaN and cast to float
data['Bare Nuclei'] = data['Bare Nuclei'].replace('?', np.nan).astype(float)
#more general
#data['Bare Nuclei'] = pd.to_numeric(data['Bare Nuclei'], errors='coerce')
#replace NaNs by means
data['Bare Nuclei'] = data['Bare Nuclei'].fillna(data['Bare Nuclei'].mean())

#verify replacing
print (data.loc[L, 'Bare Nuclei'])
23     3.544656
40     3.544656
139    3.544656
145    3.544656
158    3.544656
164    3.544656
235    3.544656
249    3.544656
275    3.544656
292    3.544656
294    3.544656
297    3.544656
315    3.544656
321    3.544656
411    3.544656
617    3.544656
Name: Bare Nuclei, dtype: float64

temp[index,'Bare Nuclei'] is a mix of boolean indexing and column selection using label which will not work. Instead, change

index = temp['Bare Nuclei'] == '?'
temp[index,'Bare Nuclei'] = mean

to

s=temp['Bare Nuclei']
temp['Bare Nuclei']=s.where(s!='?',mean)

where(s!='?',mean) actually means change the value of the element to 'mean' where the condition s!='?' does not meet (kind of confusion at first glance)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM