简体   繁体   中英

Pandas DataFrame Advanced Slicing

I am a R user and I found myself struggling a bit moving to Python, especially with the indexing capabilities of Pandas.

Household_id is my second column. I sorted my dataframe based on this column and ran the following instructions, returning various results (that I would expect to be the same). Are those expressions the same? If so, why do I see different results?

In [63]: ground_truth.columns
Out[63]: Index([Timestamp, household_id, ... (continues)

In [59]: ground_truth.ix[1107177,'household_id']
Out[59]: 2

In [60]: ground_truth.ix[1107177,1]
Out[60]: 2.0

In [61]: ground_truth.iloc[1107177,1]
Out[61]: 4.0

In [62]: ground_truth['household_id'][1107177]
Out[62]: 2

PS: I cant post the data unfortunately (too big).

NOTE: When you sort by a column, you'll rearrange the index, and assuming it wasn't sorted that way to begin with you'll have integers labels that don't equal their linear index in the array.

First, ix will first try integers as labels then as indices, so it is immediate that 59 and 62 are the same. Second, if the index is not 0:n - 1 then 1107177 is a label, not a integer index thus the difference between 60 and 61. As far as the float casting goes, it looks like you might be using an older version of pandas. This doesn't happen in git master.

Here are the docs on ix .

Here's an example with a toy DataFrame :

In [1]:

df = DataFrame(randn(10, 3), columns=list('abc'))
print df
print
print df.sort('a')

           a          b          c
0      -1.80      -0.28      -1.10
1      -0.58       1.00      -0.48
2      -2.50       1.59      -1.42
3      -1.00      -0.12      -0.93
4      -0.65       1.41       1.20
5       0.51       0.96       1.28
6      -0.28       0.13       1.59
7       1.28      -0.84       0.51
8       0.77      -1.26      -0.50
9      -0.59      -1.34      -1.06

           a          b          c
2      -2.50       1.59      -1.42
0      -1.80      -0.28      -1.10
3      -1.00      -0.12      -0.93
4      -0.65       1.41       1.20
9      -0.59      -1.34      -1.06
1      -0.58       1.00      -0.48
6      -0.28       0.13       1.59
5       0.51       0.96       1.28
8       0.77      -1.26      -0.50
7       1.28      -0.84       0.51

Notice that the sorted row indices are integers and they don't map to their locations.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM