简体   繁体   English

跨两个数据帧匹配部分字符串并合并

[英]Match partial strings across two data frames and merge

I have two data frames as given below.我有两个数据框,如下所示。 The master data:主数据:

    df1 = pd.DataFrame(
    data=
    [[['mayor', 'lord'], 25],
     [['prince', 'sheriff'], 33],
     [['king george', 'queen'], 32],
     [['bristol, counsellor'], 43],
     [['exchequer'], 45]],
    columns=['V1', 'V2']
)
df1.update(df1[['V1']].applymap('{}'.format))
print(df1)

and the other one is:另一个是:

df2 = pd.DataFrame(
    data=
    [['mayor', 55],
     ['sheriff', 54],
     ['counsellor', 53],
     ['exchequer', 50]],
    columns=['Var1', 'Var2']
)
print(df2)

I want to map Var2 from df2 in the master data if V1 contains a corresponding Var1 value.如果 V1 包含相应的 Var1 值,我想从主数据中的 df2 获取 map Var2。 That is my end goal dataframe is:那是我的最终目标 dataframe 是:

dfgoal = pd.DataFrame(
    data=
    [[['mayor', 'lord'], 25, 55],
     [['prince', 'sheriff'], 33, 54],
     [['king george', 'queen'], 32, ],
     [['bristol, counsellor'], 43, 53],
     [['exchequer'], 45, 50]],
    columns=['V1', 'V2', 'V3']
)
dfgoal.update(dfgoal[['V1']].applymap('{}'.format))
print(dfgoal)

I have tried quite a few things from Stack Overflow (like this ) but as the V1 is a list object and the match can be in any element in the list (like the counsellor obs), I am unsuccessful so far.我已经从 Stack Overflow 尝试了很多东西(像这样),但是由于 V1 是一个列表 object 并且匹配可以在列表中的任何元素中(比如辅导员 obs),所以到目前为止我没有成功。

I think this solves it:我认为这可以解决它:

df1 = pd.DataFrame(
    data=
    [[['mayor', 'lord'], 25],
     [['prince', 'sheriff'], 33],
     [['king george', 'queen'], 32],
     [['bristol', 'counsellor'], 43],
     [['exchequer'], 45]],
    columns=['V1', 'V2']
)
# df1.update(df1[['V1']].applymap('{}'.format))
print(df1)


df2 = pd.DataFrame(
    data=
    [['mayor', 55],
     ['sheriff', 54],
     ['counsellor', 53],
     ['exchequer', 50]],
    columns=['Var1', 'Var2']
)
print(df2)


mapper = df2.set_index('Var1')['Var2'].to_dict()
def get_V3(l):
    for x in l:
        if x in mapper:
            return mapper[x]
        
    return None
df1['V3'] = df1['V1'].apply(get_V3)

I'm assuming if two or more elements appear in df2, then we use the first one, hence the return inside the loop.我假设如果两个或多个元素出现在 df2 中,那么我们使用第一个,因此在循环内返回。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM