[英]Python pandas data frame: how to perform operations on two columns with the same name
Say you have a data frame like the one which follows (notice that some columns have the same name): 假设您有一个如下的数据框(请注意,有些列具有相同的名称):
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.rand(4,5), columns = list('abcab'))
The issue is if you want to perform some operations on the two columns 'a', how do you do this since they have the same name? 问题是如果你想对两个列'a'执行一些操作,你怎么做,因为它们具有相同的名称? I tried to use the replace() and rename() method to rename one of the two columns and then perform some operations but I didn't manage to do this on only one column.
我尝试使用replace()和rename()方法重命名两列之一,然后执行一些操作,但是我没有设法仅对一列进行此操作。
您应该能够执行以下操作更改列的标签:
df.columns = ['a', 'b', 'c', 'd', 'e']
You can use iloc
if you dont want rename columns: 如果您不想重命名列,可以使用
iloc
:
import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame(np.random.rand(4,5), columns = list('abcab'))
print df
a b c a b
0 0.548814 0.715189 0.602763 0.544883 0.423655
1 0.645894 0.437587 0.891773 0.963663 0.383442
2 0.791725 0.528895 0.568045 0.925597 0.071036
3 0.087129 0.020218 0.832620 0.778157 0.870012
#select first a column
print df.iloc[:,0]
0 0.548814
1 0.645894
2 0.791725
3 0.087129
Name: a, dtype: float64
#select second a column
print df.iloc[:,3]
Name: a, dtype: float64
0 0.544883
1 0.963663
2 0.925597
3 0.778157
Name: a, dtype: float64
#select first a column
print df['a'].iloc[:,0]
0 0.548814
1 0.645894
2 0.791725
3 0.087129
Name: a, dtype: float64
#select second a column
print df['a'].iloc[:,1]
0 0.544883
1 0.963663
2 0.925597
3 0.778157
Name: a, dtype: float64
EDIT: If you need only rename columns with same names, use get_loc
: 编辑:如果只需要重命名具有相同名称的列,请使用
get_loc
:
import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame(np.random.rand(4,5), columns = list('abbab'))
print df
a b b a b
0 0.548814 0.715189 0.602763 0.544883 0.423655
1 0.645894 0.437587 0.891773 0.963663 0.383442
2 0.791725 0.528895 0.568045 0.925597 0.071036
3 0.087129 0.020218 0.832620 0.778157 0.870012
cols=pd.Series(df.columns)
for dup in df.columns.get_duplicates():
cols[df.columns.get_loc(dup)]=[dup+'_'+str(d_idx) if d_idx!=0 else dup for d_idx in range(df.columns.get_loc(dup).sum())]
df.columns=cols
print df
a b b_1 a_1 b_2
0 0.548814 0.715189 0.602763 0.544883 0.423655
1 0.645894 0.437587 0.891773 0.963663 0.383442
2 0.791725 0.528895 0.568045 0.925597 0.071036
3 0.087129 0.020218 0.832620 0.778157 0.870012
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.