简体   繁体   中英

Python pandas data frame: how to perform operations on two columns with the same name

Say you have a data frame like the one which follows (notice that some columns have the same name):

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.rand(4,5), columns = list('abcab'))

The issue is if you want to perform some operations on the two columns 'a', how do you do this since they have the same name? I tried to use the replace() and rename() method to rename one of the two columns and then perform some operations but I didn't manage to do this on only one column.

您应该能够执行以下操作更改列的标签:

df.columns = ['a', 'b', 'c', 'd', 'e']

You can use iloc if you dont want rename columns:

import numpy as np
import pandas as pd

np.random.seed(0)
df = pd.DataFrame(np.random.rand(4,5), columns = list('abcab'))
print df
          a         b         c         a         b
0  0.548814  0.715189  0.602763  0.544883  0.423655
1  0.645894  0.437587  0.891773  0.963663  0.383442
2  0.791725  0.528895  0.568045  0.925597  0.071036
3  0.087129  0.020218  0.832620  0.778157  0.870012
#select first a column
print df.iloc[:,0]
0    0.548814
1    0.645894
2    0.791725
3    0.087129
Name: a, dtype: float64

#select second a column
print df.iloc[:,3]
Name: a, dtype: float64
0    0.544883
1    0.963663
2    0.925597
3    0.778157
Name: a, dtype: float64

#select first a column
print df['a'].iloc[:,0]
0    0.548814
1    0.645894
2    0.791725
3    0.087129
Name: a, dtype: float64

#select second a column
print df['a'].iloc[:,1]
0    0.544883
1    0.963663
2    0.925597
3    0.778157
Name: a, dtype: float64

EDIT: If you need only rename columns with same names, use get_loc :

import numpy as np
import pandas as pd

np.random.seed(0)
df = pd.DataFrame(np.random.rand(4,5), columns = list('abbab'))
print df
          a         b         b         a         b
0  0.548814  0.715189  0.602763  0.544883  0.423655
1  0.645894  0.437587  0.891773  0.963663  0.383442
2  0.791725  0.528895  0.568045  0.925597  0.071036
3  0.087129  0.020218  0.832620  0.778157  0.870012

cols=pd.Series(df.columns)
for dup in df.columns.get_duplicates():
    cols[df.columns.get_loc(dup)]=[dup+'_'+str(d_idx) if d_idx!=0 else dup for d_idx in range(df.columns.get_loc(dup).sum())]
df.columns=cols
print df
          a         b       b_1       a_1       b_2
0  0.548814  0.715189  0.602763  0.544883  0.423655
1  0.645894  0.437587  0.891773  0.963663  0.383442
2  0.791725  0.528895  0.568045  0.925597  0.071036
3  0.087129  0.020218  0.832620  0.778157  0.870012

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM