简体   繁体   中英

Pandas DataFrame column (Series) has different index than the Dataframe?

Consider this small script:

import pandas as pd

aa = pd.DataFrame({'a': [1,2,3]})
bb = aa.a
bb.index = bb.index + 1
aa['b'] = bb
print(aa)
print(aa.a - aa.b)

the output is:

   a    b
0  1  NaN
1  2  1.0
2  3  2.0

0    NaN
1    0.0
2    0.0
3    NaN

while I was expecting aa.a - aa.b to be

0    NaN
1    1.0
2    1.0

How is this possible? Is it a Pandas bug?

aa = pd.DataFrame({'a': [1,2,3]})
bb = aa.a
bb.index = bb.index + 1
aa['b'] = bb
aa.reset_index(drop=True)  # add this

your index does not match.

When you do aa.b - aa.a , you're substracting 2 pandas.Series having a same lenght, but not the same index:

aa.a

1    1
2    2
3    3
Name: a, dtype: int64

Where as:

aa.b

0    NaN
1    1.0
2    2.0
Name: b, dtype: float64

And when you do:

print(aa.b - aa.a)

you're printing the merge of these 2 pandas.Series (regardless the operation type: addition or substraction), and that's why the indices [0,1,2] and [1,2,3] will merged to a new index from 0 to 3: [0,1,2,3].

And for instance, if you shift of 2 your bb.index instead of 1:

bb.index = bb.index + 2

that time, you will have 5 rows in your new pandas.Series instead of 4. And so on..

bb.index = bb.index + 2
aa['b'] = bb
print(aa.a - aa.b)

0    NaN
1    NaN
2    0.0
3    NaN
4    NaN
dtype: float64

Use this code to get what you expect:

aa = pd.DataFrame({'a': [1,2,3]})
bb = aa.a.copy()
bb.index = bb.index + 1
aa['b'] = bb
print(aa)
print(aa.a - aa.b)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM