简体   繁体   English

在DataFrame熊猫中合并两列

[英]Combine two columns in a DataFrame pandas

I am having Dataframe which has multiple columns in which some columns are equal (Same key in trailing end eg: column1 = 'a/first', column2 = 'b/first'). 我有具有多个列的Dataframe,其中某些列相等(尾端具有相同的键,例如:column1 ='a / first',column2 ='b / first')。 I want to merge these two columns. 我想合并这两列。 Please help me out to solve the problem. 请帮我解决问题。

My Dataframe looks like 我的数据框看起来像

name   g1/column1  g1/column2 g1/g2/column1  g2/column2
AAAA   10             20          nan           nan
AAAA   nan            nan         30            40

My result will be like as follows 我的结果将如下所示

name   g1/column1  g1/column2
AAAA   10             20          
AAAA   30             40      

Thanks in advance 提前致谢

you need df.combine_first , 您需要df.combine_first

col1=['g1/column1', 'g1/column2']
col2=['g1/g2/column1', 'g2/column2']

df[col1]=df[col1].combine_first(pd.DataFrame(df[col2].values,columns=col1))

df=df.drop(col2,axis=1)

print(df)
#   name  g1/column1    g1/column2
#0  AAAA  10.0      20.0
#1  AAAA  30.0      40.0

Use: 采用:

#create index by all columns with no merge
df = df.set_index('name')
#MultiIndex by split last /
df.columns = df.columns.str.rsplit('/', n=1, expand=True)
#aggregate first no NaN values per second level of MultiIndex
df = df.groupby(level=1, axis=1).first()
print (df)
      column1  column2
name                  
AAAA     10.0     20.0
AAAA     30.0     40.0

One of the solution: 解决方案之一:

df = pd.DataFrame([[10, 20, np.nan, np.nan],
                  [np.nan, np.nan, 30, 40]],
                 columns=['g1/column1', 'g1/column2', 'g1/g2/column1', 'g2/column2'])
df

   g1/column1   g1/column2  g1/g2/column1   g2/column2
0   10.0        20.0        NaN             NaN
1   NaN         NaN         30.0            40.0

df = df.fillna(0)  # <- replacing all NaN with 0

ndf = pd.DataFrame() 

unique_cols = ['column1', 'column2']

for i in range(len(unique_cols)):
    val = df.columns[df.columns.str.contains(unique_cols[i])]
    ndf[val[0]] = df.loc[:,val].sum().reset_index(drop=True)

ndf  # <- You can add index if you need (AAAA, AAAA)

    g1/column1  g1/column2
0   10.0        20.0
1   30.0        40.0
import pandas as pd
import numpy as np

g1 = [20, np.nan, 30, np.nan]
g1_2 = [10, np.nan, 20, np.nan]
g2 = [np.nan, 30, np.nan, 40]
g2_2 = [np.nan, 10, np.nan, 30]

dataList = list(zip(g1, g1_2, g2, g2_2))
df = pd.DataFrame(data = dataList, columns=['g1/column1', 'g1/column2', 'g1/g2/column1', 'g2/column2'])

df.fillna(0, inplace=True)

df['g1Combined'] = df['g1/column1'] + df['g1/g2/column1']
df['g2Combined'] = df['g1/column2'] + df['g2/column2']
df.drop('g1/column1', axis=1, inplace=True)
df.drop('g1/column2', axis=1, inplace=True)
df.drop('g1/g2/column1', axis=1, inplace=True)
df.drop('g2/column2', axis=1, inplace=True)
df

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM