python pd.dataframe issue, index 13 gives an error?

Question

So as you can see below, my proteinID dataframe has 4292 members, when I try to print them out, I get an error at index 13 and I don't understand why.

Any idea what's going on?

print proteinID.shape
print X_final.shape
for i,prot in enumerate(X_final):
    print i
    print prot
    print proteinID[i]

This gives me:

(4292L,)

(4292L, 4L)

0

[ 0.01070217  0.86624627  0.30031799  1.0022054 ]

Q9BV57

1

[ 0.14132098  0.5899623  -0.08037944  0.04028686]

Q04446

2

[ 0.14768145  0.37698604 -0.08798323 -0.71181829]

P61604

3

[ 0.23194252 -0.17301326 -0.20914528  0.27447231]

Q15029

4

[ 0.13608163  0.41788998  0.06103427 -0.1557695 ]

Q9NRX4

5

[ 0.11981057  0.62419406  0.085566    0.43029529]

P31946

6

[ 0.14734698  0.53942167  0.1647835   0.20525244]

P62258

7

[ 0.13301821  0.25249911  0.32216093  0.46965642]

Q04917

8

[ 0.30891193  0.35936887  0.14029331  0.22116058]

P61981

9

[ 0.15670011 -0.0317209   0.48168144  0.58226224]

P31947;REV__Q13315

10

[ 0.059664    0.52769527  0.09302036  0.28445371]

P27348

11

[ 0.22201161  0.703846    0.19846719  0.53470435]

P63104

12

[ 0.53312759  0.48972197 -0.15224852  0.16086491]

---------------------------------------------------------------------------

    KeyError                                  Traceback (most recent call last)

    <ipython-input-54-45a793f9a457> in <module>()
      4     print i

      5     print prot

      ----> 6     print proteinID[i]


    C:\Anaconda\lib\site-packages\pandas\core\series.pyc in __getitem__(self, key)

    507     def __getitem__(self, key):

    508         try:

      --> 509             result = self.index.get_value(self, key)

    510 

    511             if not np.isscalar(result):



    C:\Anaconda\lib\site-packages\pandas\core\index.pyc in get_value(self, series, 
    key)
       1415 

       1416         try:

    -> 1417             return self._engine.get_value(s, k)

       1418         except KeyError as e1:

       1419             if len(self) > 0 and self.inferred_type in 
    ['integer','boolean']:


pandas\index.pyx in pandas.index.IndexEngine.get_value (pandas\index.c:3109)()



pandas\index.pyx in pandas.index.IndexEngine.get_value (pandas\index.c:2840)()



pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:3700)()



pandas\hashtable.pyx in pandas.hashtable.Int64HashTable.get_item 
(pandas\hashtable.c:7229)()


pandas\hashtable.pyx in pandas.hashtable.Int64HashTable.get_item 
(pandas\hashtable.c:7167)()


KeyError: 12L

EDIT: the first 50 values of proteinID

for i,n in enumerate(proteinID):
    print i, n

0 Q9BV57
1 Q04446
2 P61604
3 Q15029
4 Q9NRX4
5 P31946
6 P62258
7 Q04917
8 P61981
9 P31947;REV__Q13315
10 P27348
11 P63104
12 O60613
13 Q9C0C2
14 Q9Y2I7
15 Q01970
16 P19174
17 P09543
18 Q6L8Q7
19 P62333
20 P62191
21 P17980
22 P43686
23 P35998
24 P62195
25 Q99460
26 O75832
27 O00231
28 O00232
29 Q9UNM6
30 O00487
31 Q13200
32 O43242
33 P55036
34 Q15008
35 P51665
36 P48556
37 O00233
38 Q13442
39 P82912
40 O15235
41 O60783
42 Q9Y3D3
43 Q9Y2R5
44 Q9NVS2
45 Q9Y676
46 Q9Y399
47 P82650
48 Q9Y3D9
49 P82663
50 Q9BYN8

Answer 1

I noticed that after removing NaN values using:

#instead of imputing, we remove rows with nan values
valid_mask = [np.all(~np.isnan(x)) for x in data.values]
print data[valid_mask].shape
X_imputed = data[valid_mask].values
proteinID = proteinID[valid_mask]

the indexes are retained, so in this case, the missing index used to be a row with a NaN value.

python pd.dataframe issue, index 13 gives an error?

Question

1 answers

solution1
0 ACCPTED 2015-11-05 12:29:26

python pd.dataframe issue, index 13 gives an error?

Question

1 answers

solution1 0 ACCPTED 2015-11-05 12:29:26

solution1
0 ACCPTED 2015-11-05 12:29:26