I have a dataframe which contains 111 columns in which some of them have the same column names. The total unique column names are 27.
>>> has_2.head(6)
Has_MCS_A Has_MCS_A Has_MCS_A Has_MCS_A \
0 0 0 3
0 1 0 0
0 0 0 0
1 0 0 0
0 0 10 0
0 0 0 0
Has_MCS_B Has_MCS_B Has_MCS_B Has_MCS_B \
0 0 0 6
0 0 0 0
0 9 0 0
10 0 0 0
0 0 0 0
0 0 7 0
I want to add the values in these columns with the same column name. So finally the result should be a dataframe with only 27 columns
You can construct a new df and iterate over the unique column values and then assign for each column the sum
row-wise:
In [21]:
import io
import pandas as pd
t="""Has_MCS_A Has_MCS_A Has_MCS_A Has_MCS_A
0 0 0 3
0 1 0 0
0 0 0 0
1 0 0 0
0 0 10 0
0 0 0 0 """
df = pd.read_csv(io.StringIO(t), sep='\s+')
df
Out[21]:
Has_MCS_A Has_MCS_A.1 Has_MCS_A.2 Has_MCS_A.3
0 0 0 0 3
1 0 1 0 0
2 0 0 0 0
3 1 0 0 0
4 0 0 10 0
5 0 0 0 0
In [22]:
# overwrite the columns to force duplicate names
df.columns = ['Has_MCS_A','Has_MCS_A','Has_MCS_A','Has_MCS_A']
df
Out[22]:
Has_MCS_A Has_MCS_A Has_MCS_A Has_MCS_A
0 0 0 0 3
1 0 1 0 0
2 0 0 0 0
3 1 0 0 0
4 0 0 10 0
5 0 0 0 0
In [23]:
# construct a new df
new_df = pd.DataFrame()
for col in df.columns.unique():
new_df[col] = df[col].sum(axis=1)
new_df
Out[23]:
Has_MCS_A
0 3
1 1
2 0
3 1
4 10
5 0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.