That was not easy to put into one sentence, basically, I have two datasets I would like to combine on two datapoints--the name and the date. I've provided a short example here of how the data is structured: https://ethercalc.net/a4k8lejblmhe
Year Name Alternative Name Favorite Pet
1998 William Bill Cat
1995 James Jim Dog
1956 Robert Bob Hamster
Year Name Sales
1998 William 2000
1995 Jim 3005
1956 Bob 6000
EXPECTED:
Year Name Sales Favorite Pet
1998 William 2000 Cat
1995 Jim 3005 Dog
1956 Bob 6000 Hamster
However, one of the datasets have both a name and an alternative name. These are fairly large datasets, so I would like to cover all my bases by merging on both the name and alternative name and date. I know how to combine on just the year and name:
nameCombined = names1.merge(names2, left_on=["year", "name"], right_on=["year", "name"], how='left')
That being said, what is the best way to use some kind of conditional that says if there's no match between the year and the regular name, check the year and the alternative name before assigning null values for the merge?
left merge on ["Year", "Name"]
then left merge on ["Year", "Alternative Name"]
(separately) then combine them and remove duplicates.
This assumes that the original order doesn't matter, if it does tell me and I'll show you how to keep that.
nameCombined = names1[["Year", "Name", "Favorite Pet"]].merge(names2, left_on=["Year", "Name"], right_on=["Year", "Name"], how='left')
AlternativeNameCombined = names1[["Year", "Alternative Name", "Favorite Pet"]].merge(names2, left_on=["Year", "Alternative Name"], right_on=["Year", "Name"], how='left')
AlternativeNameCombined.columns = ["Year", "Name", "Sales", "Favorite Pet"]
allCombined = nameCombined.append(AlternativeNameCombined).drop_duplicates(subset=["Year", "Name"], keep="first").reset_index(drop=True)
Here is an example using 2 inner join
+ concat
:
df1 = pd.DataFrame({
'Year': (1998, 1995, 1956,),
'Name': ('William', 'James', 'Robert'),
'Alternative Name': ('Bill', 'Jim', 'Bob'),
'Favorite Pet': ('Cat', 'Dog', 'Hamster'),
})
df2 = pd.DataFrame({
'Year': (1998, 1995, 1956,),
'Name': ('William', 'Jim', 'Bob'),
'Sales': (2000, 3005, 6000),
})
# by Name
df = df1.drop(columns=['Alternative Name']).merge(df2, on=['Year', 'Name'])
# by Alternative Name
df1 = df1.drop(columns=['Name']).rename(columns={'Alternative Name': 'Name'})
# union
df = pd.concat([
df,
df2.merge(df1, on=['Year', 'Name'])
], sort=False)
print(df)
# Year Name Favorite Pet Sales
# 0 1998 William Cat 2000
# 0 1995 Jim Dog 3005
# 1 1956 Bob Hamster 6000
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.