简体   繁体   中英

Replace a column in Pandas dataframe with another that has same index but in a different order

I'm trying to re-insert back into a pandas dataframe a column that I extracted and of which I changed the order by sorting it.

Very simply, I have extracted a column from a pandas df:

col1 = df.col1

This column contains integers and I used the .sort() method to order it from smallest to largest. And did some operation on the data.

col1.sort()
#do stuff that changes the values of col1.

Now the indexes of col1 are the same as the indexes of the overall df, but in a different order.

I was wondering how I can insert the column back into the original dataframe (replacing the col1 that is there at the moment)

I have tried both of the following methods:

1)

df.col1 = col1

2)

df.insert(column_index_of_col1, "col1", col1)

but both methods give me the following error:

ValueError: cannot reindex from a duplicate axis

Any help will be greatly appreciated. Thank you.

Consider this DataFrame:

df = pd.DataFrame({'A': [1, 2, 3], 'B': [6, 5, 4]}, index=[0, 0, 1])

df
Out: 
   A  B
0  1  6
0  2  5
1  3  4

Assign the second column to b and sort it and take the square, for example:

b = df['B']
b = b.sort_values()
b = b**2

Now b is:

b
Out: 
1    16
0    25
0    36
Name: B, dtype: int64

Without knowing the exact operation you've done on the column, there is no way to know whether 25 corresponds to the first row in the original DataFrame or the second one. You can take the inverse of the operation (take the square root and match, for example) but that would be unnecessary I think. If you start with an index that has unique elements ( df = df.reset_index() ) it would be much easier. In that case,

df['B'] = b

should work just fine.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM