[英]Replace a column in Pandas dataframe with another that has same index but in a different order
I'm trying to re-insert back into a pandas dataframe a column that I extracted and of which I changed the order by sorting it. 我正在尝试将提取的列重新插入到pandas数据框中,并对其排序进行了更改。
Very simply, I have extracted a column from a pandas df: 很简单,我从pandas df中提取了一列:
col1 = df.col1
This column contains integers and I used the .sort() method to order it from smallest to largest. 此列包含整数,我使用.sort()方法将其从最小到最大排序。 And did some operation on the data. 并对数据做了一些操作。
col1.sort()
#do stuff that changes the values of col1.
Now the indexes of col1 are the same as the indexes of the overall df, but in a different order. 现在,col1的索引与整个df的索引相同,但是顺序不同。
I was wondering how I can insert the column back into the original dataframe (replacing the col1 that is there at the moment) 我想知道如何将列插入回原始数据帧中(替换目前的col1)
I have tried both of the following methods: 我尝试了以下两种方法:
1) 1)
df.col1 = col1
2) 2)
df.insert(column_index_of_col1, "col1", col1)
but both methods give me the following error: 但是两种方法都给我以下错误:
ValueError: cannot reindex from a duplicate axis
Any help will be greatly appreciated. 任何帮助将不胜感激。 Thank you. 谢谢。
Consider this DataFrame: 考虑以下DataFrame:
df = pd.DataFrame({'A': [1, 2, 3], 'B': [6, 5, 4]}, index=[0, 0, 1])
df
Out:
A B
0 1 6
0 2 5
1 3 4
Assign the second column to b
and sort it and take the square, for example: 将第二列分配给b
并对其进行排序并取平方,例如:
b = df['B']
b = b.sort_values()
b = b**2
Now b
is: 现在b
是:
b
Out:
1 16
0 25
0 36
Name: B, dtype: int64
Without knowing the exact operation you've done on the column, there is no way to know whether 25 corresponds to the first row in the original DataFrame or the second one. 如果不知道您对列所做的确切操作,就无法知道25是对应于原始DataFrame的第一行还是第二行。 You can take the inverse of the operation (take the square root and match, for example) but that would be unnecessary I think. 您可以取反运算(例如,取平方根并匹配),但是我认为这是不必要的。 If you start with an index that has unique elements ( df = df.reset_index()
) it would be much easier. 如果您从具有唯一元素的索引开始( df = df.reset_index()
),它将容易得多。 In that case, 在这种情况下,
df['B'] = b
should work just fine. 应该工作正常。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.