简体   繁体   中英

Numpy Summing All At Once Gives NaN, But Summing Separately Does Not

I have some data - all of it are non-negative. Numpy says its sum is nan, but I don't believe it is. Here is my explanation:

First, I read in the training data:

dataframe = pandas.read_csv( "buggy.csv" )
training = dataframe.ix[:,dataframe.columns != "Survived"].values.astype( np.float32 )

The training features are stored in a numpy array. I sum the first 61 rows and add it to the sum of the 62nd row:

sum1 = training[0:61][:].sum()
sum2 = training[62][:].sum()
print sum1 + sum2

I get the following output: 5788.54

I sum the first 62 rows:

print training[0:62][:].sum()

I get the following output: nan

Why do I get nan with the second summation? All my data are non-negative, so I don't think the order of the numbers matters. Thanks in advance for the help.

(Also, this is python 2.7 from anaconda 4.0.4)


Here is the full code:

import numpy as np
import pandas

dataframe = pandas.read_csv( "buggy.csv" )
training = dataframe.ix[:,dataframe.columns != "Survived"].values.astype( np.float32 )
labels = dataframe[ "Survived" ].values.astype( np.float32 )


sum1 = training[0:61][:].sum()
sum2 = training[62][:].sum()
print sum1 + sum2
print training[0:62][:].sum()

Here is the minimal data necessary to reproduce the problem (just copy-paste it into a file called "buggy.csv"):

,Survived,Pclass,Sex,Age,SibSp,Parch,Fare,Embarked
0,0,3,0,22.0,1,0,7.25,2.0
1,1,1,1,38.0,1,0,71.2833,0.0
2,1,3,1,26.0,0,0,7.925,2.0
3,1,1,1,35.0,1,0,53.1,2.0
4,0,3,0,35.0,0,0,8.05,2.0
5,0,3,0,29.6991176471,0,0,8.4583,1.0
6,0,1,0,54.0,0,0,51.8625,2.0
7,0,3,0,2.0,3,1,21.075,2.0
8,1,3,1,27.0,0,2,11.1333,2.0
9,1,2,1,14.0,1,0,30.0708,0.0
10,1,3,1,4.0,1,1,16.7,2.0
11,1,1,1,58.0,0,0,26.55,2.0
12,0,3,0,20.0,0,0,8.05,2.0
13,0,3,0,39.0,1,5,31.275,2.0
14,0,3,1,14.0,0,0,7.8542,2.0
15,1,2,1,55.0,0,0,16.0,2.0
16,0,3,0,2.0,4,1,29.125,1.0
17,1,2,0,29.6991176471,0,0,13.0,2.0
18,0,3,1,31.0,1,0,18.0,2.0
19,1,3,1,29.6991176471,0,0,7.225,0.0
20,0,2,0,35.0,0,0,26.0,2.0
21,1,2,0,34.0,0,0,13.0,2.0
22,1,3,1,15.0,0,0,8.0292,1.0
23,1,1,0,28.0,0,0,35.5,2.0
24,0,3,1,8.0,3,1,21.075,2.0
25,1,3,1,38.0,1,5,31.3875,2.0
26,0,3,0,29.6991176471,0,0,7.225,0.0
27,0,1,0,19.0,3,2,263.0,2.0
28,1,3,1,29.6991176471,0,0,7.8792,1.0
29,0,3,0,29.6991176471,0,0,7.8958,2.0
30,0,1,0,40.0,0,0,27.7208,0.0
31,1,1,1,29.6991176471,1,0,146.5208,0.0
32,1,3,1,29.6991176471,0,0,7.75,1.0
33,0,2,0,66.0,0,0,10.5,2.0
34,0,1,0,28.0,1,0,82.1708,0.0
35,0,1,0,42.0,1,0,52.0,2.0
36,1,3,0,29.6991176471,0,0,7.2292,0.0
37,0,3,0,21.0,0,0,8.05,2.0
38,0,3,1,18.0,2,0,18.0,2.0
39,1,3,1,14.0,1,0,11.2417,0.0
40,0,3,1,40.0,1,0,9.475,2.0
41,0,2,1,27.0,1,0,21.0,2.0
42,0,3,0,29.6991176471,0,0,7.8958,0.0
43,1,2,1,3.0,1,2,41.5792,0.0
44,1,3,1,19.0,0,0,7.8792,1.0
45,0,3,0,29.6991176471,0,0,8.05,2.0
46,0,3,0,29.6991176471,1,0,15.5,1.0
47,1,3,1,29.6991176471,0,0,7.75,1.0
48,0,3,0,29.6991176471,2,0,21.6792,0.0
49,0,3,1,18.0,1,0,17.8,2.0
50,0,3,0,7.0,4,1,39.6875,2.0
51,0,3,0,21.0,0,0,7.8,2.0
52,1,1,1,49.0,1,0,76.7292,0.0
53,1,2,1,29.0,1,0,26.0,2.0
54,0,1,0,65.0,0,1,61.9792,0.0
55,1,1,0,29.6991176471,0,0,35.5,2.0
56,1,2,1,21.0,0,0,10.5,2.0
57,0,3,0,28.5,0,0,7.2292,0.0
58,1,2,1,5.0,1,2,27.75,2.0
59,0,3,0,11.0,5,2,46.9,2.0
60,0,3,0,22.0,0,0,7.2292,0.0
61,1,1,1,38.0,0,0,80.0,
62,0,1,0,45.0,1,0,83.475,2.0

You are skipping row 61, that's the problematic one. training[0:61][:].sum() excludes row 61.

training[61]
Out[10]: array([ 61.,   1.,   1.,  38.,   0.,   0.,  80.,  nan], dtype=float32)

The last column is missing, it has only 7 values.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM