簡體   English   中英

當我在dataframe(pandas)中設置值時出現錯誤:“系列”對象是可變的,因此無法進行哈希處理

[英]when I set value in dataframe(pandas) there is error: 'Series' objects are mutable, thus they cannot be hashed

我想通過data [Bare Nuclei']!='?'來更改pandas DataFrame中的值。

import pandas as pd
import numpy as np
column_names = ['Sample code number', 'Clump Thickness', 
                'Uniformity of Cell Size', 'Uniformity of Cell Shape',
                'Marginal Adhesion', 'Single Epithelial Cell Size',
                'Bare Nuclei', 'Bland Chromatin', 'Normal Nucleoli',
                'Mitoses', 'Class']
data = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data', names = column_names )
mean = 0
n = 0
for index,row in data.iterrows():
    if row['Bare Nuclei'] != '?':
        n += 1
        mean += int(row['Bare Nuclei'])
mean = mean / n
temp = data
index = temp['Bare Nuclei'] == '?'
temp[index,'Bare Nuclei'] = mean

這是Jupyter Notebook給我錯誤: 在此處輸入圖片說明

我想知道如何更改數據框的值以及為什么我的方法是錯誤的? 您能幫我嗎,我期待您的幫助!

最后一行添加DataFrame.loc ,因為需要更改DataFrame列:

temp.loc[index,'Bare Nuclei'] = mean

但是在大熊貓中最好避免循環,因為它很慢。 那么更好的解決方案是replace ? NaN ,然后以mean s fillna

data['Bare Nuclei'] = data['Bare Nuclei'].replace('?', np.nan).astype(float)
#more general
#data['Bare Nuclei'] = pd.to_numeric(data['Bare Nuclei'], errors='coerce')
data['Bare Nuclei'] = data['Bare Nuclei'].fillna(data['Bare Nuclei'].mean())

替代解決方案:

mask = data['Bare Nuclei'] == '?'
data['Bare Nuclei'] = data['Bare Nuclei'].mask(mask).astype(float)
data['Bare Nuclei'] = data['Bare Nuclei'].fillna(data['Bare Nuclei'].mean())

驗證解決方案:

column_names = ['Sample code number', 'Clump Thickness', 
                'Uniformity of Cell Size', 'Uniformity of Cell Shape',
                'Marginal Adhesion', 'Single Epithelial Cell Size',
                'Bare Nuclei', 'Bland Chromatin', 'Normal Nucleoli',
                'Mitoses', 'Class']
data = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data', names = column_names )
#print (data.head())

#get index values by condition
L = data.index[data['Bare Nuclei'] == '?'].tolist()
print (L)
[23, 40, 139, 145, 158, 164, 235, 249, 275, 292, 294, 297, 315, 321, 411, 617]

#get mean of values converted to numeric
print (data['Bare Nuclei'].replace('?', np.nan).astype(float).mean())
3.5446559297218156

print (data.loc[L, 'Bare Nuclei'])
23     ?
40     ?
139    ?
145    ?
158    ?
164    ?
235    ?
249    ?
275    ?
292    ?
294    ?
297    ?
315    ?
321    ?
411    ?
617    ?
Name: Bare Nuclei, dtype: object

#convert to numeric - replace `?` to NaN and cast to float
data['Bare Nuclei'] = data['Bare Nuclei'].replace('?', np.nan).astype(float)
#more general
#data['Bare Nuclei'] = pd.to_numeric(data['Bare Nuclei'], errors='coerce')
#replace NaNs by means
data['Bare Nuclei'] = data['Bare Nuclei'].fillna(data['Bare Nuclei'].mean())

#verify replacing
print (data.loc[L, 'Bare Nuclei'])
23     3.544656
40     3.544656
139    3.544656
145    3.544656
158    3.544656
164    3.544656
235    3.544656
249    3.544656
275    3.544656
292    3.544656
294    3.544656
297    3.544656
315    3.544656
321    3.544656
411    3.544656
617    3.544656
Name: Bare Nuclei, dtype: float64

temp [index,'Bare Nuclei']是布爾索引和使用標簽的列選擇的混合,這將不起作用。 相反,改變

index = temp['Bare Nuclei'] == '?'
temp[index,'Bare Nuclei'] = mean

s=temp['Bare Nuclei']
temp['Bare Nuclei']=s.where(s!='?',mean)

where(s!='?',mean)實際上意味着將條件s!='?的元素的值更改為'mean'。 不滿足(乍一看有點困惑)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM