Merge two columns from multiple panda series dataframes based on string matching from two columns with different values

Question

I need to merge two columns from a pandas series dataframe together on the last 4 digits of the first column pack_number . I currently have 2 dataframes with a different number of columns.

So far, I thought about extracting the last 4 digits of the ROOT_VIN but I'm not sure how to proceed with column matching and merging these tables together.

Ideally, I'd like to merge the dataframe including pack_number (which includes 6 other columns with different values) into the dataframe including ROOT_VIN which also has other columns with different lengths. The goal is to include the pack_number and match it with its ROOT_VIN .

After merging, i now have 108k rows (matching the 108k rows for df2) and many of the rows are duplicated because df1 only had 451 rows.

Answer 1

If I understand right then you have 2 data frames with a lot of columns.

# df1 # pack_number Table
# df2 # ROOT_VIN Table

# @BeRT2me solution
df2['pack_number'] = df2['ROOT_VIN'].str[-4:]

# joining the 2 dataframes
# we add '_remove' to the duplicate columns names
df3 = df1.merge(df2, how='inner', on='pack_number', suffixes=('', '_remove'))

# remove the columns that end with '_remove'
df3.drop(
    [col_name for col_name in df_3.columns if '_remove' in col_name],
    axis=1, inplace=True
)

Merge two columns from multiple panda series dataframes based on string matching from two columns with different values

Question

1 answers

solution1
1 ACCPTED 2022-06-16 18:37:54

Merge two columns from multiple panda series dataframes based on string matching from two columns with different values

Question

1 answers

solution1 1 ACCPTED 2022-06-16 18:37:54

solution1
1 ACCPTED 2022-06-16 18:37:54