简体   繁体   中英

“fillna” command in python not returning mean using pandas

I am trying to run the fillna command in python. It simply fails to replace the Nan values with anything, and it does not return an error.

import pandas as pd
import io
import requests
import numpy as np
url='https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data'
s=requests.get(url).content
df=pd.read_csv(io.StringIO(s.decode('utf-8')))
df.columns=['Scn', 'A2', 'A3', 'A4', 'A5', 'A6', 'A7', 'A8', 'A9', 'A10', 'CLASS']


df.to_csv("wisconsinbreast.csv")

m,n=df.shape
#print(m,n)
df = df.replace('?', np.nan)
#print(df)
#print(df.mean()) 
print(df.fillna(df.mean()))

In line 22, Nan is still there. I have done everything that I can find by searching questions here, but this is not even giving me feedback on why it is failing. As I understand it, the df.mean should calculate without the Nan values, but df.mean does not return a value for the column that contains Nan.

na_values in read_csv

That '?' trips everything up. When read_csv sees it, it assumes the whole column is of dtype object and reads it in as strings. Sure, you could fix this after the fact but I suggest using the na_values argument to head this off at the beginning:

df = pd.read_csv(io.StringIO(s.decode('utf-8')), na_values=['?'])

pd.to_numeric

But if you really wanted to fix it after the fact, do this instead of the replace

df.A7 = pd.to_numeric(df.A7, errors='coerce')

In either case, the fillna should work as expected afterwards

df.fillna(df.mean())

         Scn  A2  A3  A4  A5  A6         A7  A8  A9  A10  CLASS
0    1002945   5   4   4   5   7  10.000000   3   2    1      2
1    1015425   3   1   1   1   2   2.000000   3   1    1      2
2    1016277   6   8   8   1   3   4.000000   3   7    1      2
3    1017023   4   1   1   3   2   1.000000   3   1    1      2
4    1017122   8  10  10   8   7  10.000000   9   7    1      4
5    1018099   1   1   1   1   2  10.000000   3   1    1      2
6    1018561   2   1   2   1   2   1.000000   3   1    1      2
7    1033078   2   1   1   1   2   1.000000   1   1    5      2
8    1033078   4   2   1   1   2   1.000000   2   1    1      2
9    1035283   1   1   1   1   1   1.000000   3   1    1      2
10   1036172   2   1   1   1   2   1.000000   2   1    1      2
11   1041801   5   3   3   3   2   3.000000   4   4    1      4
12   1043999   1   1   1   1   2   3.000000   3   1    1      2
13   1044572   8   7   5  10   7   9.000000   5   5    4      4
14   1047630   7   4   6   4   6   1.000000   4   3    1      4
15   1048672   4   1   1   1   2   1.000000   2   1    1      2
16   1049815   4   1   1   1   2   1.000000   3   1    1      2
17   1050670  10   7   7   6   4  10.000000   4   1    2      4
18   1050718   6   1   1   1   2   1.000000   3   1    1      2
19   1054590   7   3   2  10   5  10.000000   5   4    4      4
20   1054593  10   5   5   3   6   7.000000   7  10    1      4
21   1056784   3   1   1   1   2   1.000000   2   1    1      2
22   1057013   8   4   5   1   2   3.548387   7   3    1      4
23   1059552   1   1   1   1   2   1.000000   3   1    1      2
24   1065726   5   2   3   4   2   7.000000   3   6    1      4

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM