简体   繁体   English

用相似的字符串连接两列上的两个数据框 Python

[英]Join two dataframes on two columns with similar strings Python

I have two dataframes (df1 and df2) that I want to left merge on using two columns,'State' (ie Arkansas) and 'County' (ie Union).我有两个数据框(df1 和 df2),我想使用两列合并,“州”(即阿肯色州)和“县”(即联合)。 (Union is a county in Arkansas). (联盟是阿肯色州的一个县)。

df1 and df2 need to match across for 'State' and 'County', but df2 has county names with additional strings (ie Woodmont County Borough) not found in df1 county names (ie Woodmont). df1 和 df2 需要匹配 'State' 和 'County',但 df2 的县名带有额外的字符串(即 Woodmont County Borough),但在 df1 县名(即 Woodmont)中找不到。

What can I do to left merge these two dataframes with different representations of counties?我该怎么做才能将这两个数据框与县的不同表示合并? I have many county names.我有很多县名。

First, get a list of the 'County' in df1首先,获取 df1 中的“县”列表

Then, create a new column in df2 which, if a county in county_list is found in df2.County, stores it in that new column which we will call County_cleaned然后,在 df2 中创建一个新列,如果在 df2.County 中找到 County_list 中的县,则将其存储在我们称为 County_cleaned 的新列中

Then for each County in county_list , if it appears in df2['County'] , then place that in the newly created County_cleaned然后对于county_list中的每个县,如果它出现在df2['County']中,则将其放入新创建的County_cleaned

And now, you can merge df1 and df2 together (lets call it df3), using this newly created column in df2:现在,您可以使用 df2 中新创建的列将 df1 和 df2 合并在一起(我们称之为 df3):

# get a list of the counties in df1
county_list = df1.County.unique()

#initialise a new column to empty string
df2['County_cleaned'] = ''

#for each of the counties in df1, if a county from df1 appears 
#somewhere in the df2.County, then add it to the newly created 
#column called County_cleaned 
for c in county_list:
    df2.loc[df2['County'].str.contains(c), 'County_cleaned']=c

#merge the 2 dataframes to create df3
df3 = df1.merge(df, how='inner', left_on=['State','County'], right_on=['State', 'County_cleaned')

Note: I've set how='inner' but this can also be 'outer','left','right' depending on the type of join.注意:我设置了 how='inner' 但这也可以是 'outer','left','right' 取决于连接的类型。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM