add selected columns from two pandas dfs

Question

I have two pandas dataframes a_df and b_df. a_df has columns ID, atext, and var1-var25, while b_df has columns ID, atext, and var1-var 25.

I want to add ONLY the corresponding vars from a_df and b_df and leave ID, and atext alone.

The code below adds ALL the corresponding columns. Is there a way to get it to add just the columns of interest?

absum_df=a_df.add(b_df)

What could I do to achieve this?

Answer 1

Use filter :

absum_df = a_df.filter(like='var').add(b_df.filter(like='var'))

If you want to keep additional columns as-is, use concat after summing:

absum_df = pd.concat([a_df[['ID', 'atext']], absum_df], axis=1)

Alternatively, instead of subselecting columns from a_df , you could instead just drop the columns in absum_df , if you want to add all columns from a_df not in absum_df :

absum_df = pd.concat([a_df.drop(absum_df.columns axis=1), absum_df], axis=1)

Answer 2

You can subset a dataframe to particular columns:

var_columns = ['var-{}'.format(i) for i in range(1,26)]
absum_df=a_df[var_columns].add(b_df[var_columns])

Note that this will result in a dataframe with only the var columns. If you want a dataframe with the non-var columns from a_df, and the var columns being the sum of a_df and b_df, you can do

absum_df = a_df.copy()
absum_df[var_columns] = a_df[var_columns].add(b_df[var_columns])

add selected columns from two pandas dfs

Question

2 answers

solution1
2 2018-05-02 22:04:19

solution2
1 ACCPTED 2018-05-02 22:15:21

add selected columns from two pandas dfs

Question

2 answers

solution1 2 2018-05-02 22:04:19

solution2 1 ACCPTED 2018-05-02 22:15:21

solution1
2 2018-05-02 22:04:19

solution2
1 ACCPTED 2018-05-02 22:15:21