简体   繁体   中英

How to correctly merge/join a feature to another dataframe

I have to dataframes and I want to add 3 features of the first dataframe to the second but ONLY if they match on a certain key value (TicketNr). This key is not unique and can occur multiple times in both dataframes.

I have tried different versions of concat, merge and join but I can't get it they way I need. I don't want to add any rows to the dataframe, just these three columns.

I think this illustration sums up my question. Who can help me with the right code? 在此处输入图片说明

You mentioned that TicketNr is not unique in the training set, but if I am correct to assume that TicketSurvRate, AllSurvived, AllDIED are the same as long as TicketNr is the same, we could try the following:

# Drop duplicates to get one row per TicketNr, assuming that
# TicketSurvRate, AllSurvived, AllDIED are uniquely defined by TicketNr 
x = engineered_train[
    ['TicketNr', 'TicketSurvRate', 'AllSurvived', 'AllDIED']].drop_duplicates()

# Merge test dataset with these de-duplicated stats.
# The how='left' parameter will keep all records from the test set.
# There will be `NaN`s where no match for TicketNr is found.
engineered_test.merge(x, how='left', on='TicketNr')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM