繁体   English   中英

基于Python中的多个条件将来自多个数据帧的一列合并到另一个数据帧

[英]Merge one column from multiple dataframes to another dataframe based on multiple conditions in Python

假设我有一个名为df的组合数据df ,如下所示。 每行都有建筑物的信息及其匹配的建筑物的信息。 我希望从 df1、df2 和 df3(见下文)合并每个建筑物的id df_nummatched_df_num的列用于区分建筑物信息来自哪个数据帧,如果它等于1,则表示它来自df1 , 2 表示来自df2 , 3 表示来自df3

   df_num       city                             name  matched_df_num  \
0      1   Shenzhen                      Kingkey 100              2   
1      2   Shenzhen           Ping An Finance Centre              2   
2      2   Shenzhen           Ping An Finance Centre              3   
3      2  Guangzhou     Guangzhou CTF Finance Centre              3   
4      3   Shanghai  Shanghai World Financial Center              3   

  matched_city                     matched_name  similarity_ratio  
0     Shenzhen           Ping An Finance Centre                51  
1    Guangzhou     Guangzhou CTF Finance Centre                66  
2     Shanghai  Shanghai World Financial Center                59  
3     Shanghai  Shanghai World Financial Center                56  
4     Changsha            Changsha IFS Tower T1                57  

我想合并来自df1df2df3id列,用于构建名称和匹配名称:

df1 = pd.DataFrame(np.array([
    [1010667747, 'Suzhou', 'Suzhou IFS'],
    [1010667356, 'Shenzhen', 'Kingkey 100'],
    [1010667289, 'Wuhan', 'Wuhan Center']]),
    columns=['id', 'city', 'name']
)
df2 = pd.DataFrame(np.array([
    [190010, 'Shenzhen', 'Ping An Finance Centre'],
    [190012, 'Guangzhou', 'Guangzhou CTF Finance Centre'],
    [190015, 'Beijing', 'China Zun']]),
    columns=['id', 'city', 'name']
)
df3 = pd.DataFrame(np.array([
    ['ZY-13', 'Shanghai', 'Shanghai World Financial Center'],
    ['ZY-15', 'Hong Kong', 'International Commerce Centre'],
    ['ZY-16', 'Changsha', 'Changsha IFS Tower T1']]),
    columns=['id', 'city', 'name']
)

这是我的预期结果:

   df_num       city                             name          id  \
0      1   Shenzhen                      Kingkey 100  1010667356   
1      2   Shenzhen           Ping An Finance Centre      190010   
2      2   Shenzhen           Ping An Finance Centre      190010   
3      2  Guangzhou     Guangzhou CTF Finance Centre      190012   
4      3   Shanghai  Shanghai World Financial Center       ZY-13   

   matched_df_num  matched_city                     matched_name  \
0              2     Shenzhen           Ping An Finance Centre   
1              2    Guangzhou     Guangzhou CTF Finance Centre   
2              3     Shanghai  Shanghai World Financial Center   
3              3     Shanghai  Shanghai World Financial Center   
4              3     Changsha            Changsha IFS Tower T1   

   similarity_ratio matched_id  
0                51     190010  
1                66     190010  
2                59      ZY-13  
3                56      ZY-13  
4                57      ZY-16  

如何使用 Pandas 在df插入两个新列idmatched_id及其值? 感谢您提前提供帮助。

更新:我的解决方案:

df = df.merge(df1, on = ['city', 'name'], how = 'left').merge(df2, on = ['city', 'name'], how = 'left').merge(df3, on = ['city', 'name'], how = 'left')
final_df = df.merge(df1, left_on = ['matched_city', 'matched_name'], right_on = ['city', 'name'], how = 'left').merge(df2, left_on = ['matched_city', 'matched_name'], right_on = ['city', 'name'], how = 'left').merge(df3, left_on = ['matched_city', 'matched_name'], right_on = ['city', 'name'], how = 'left')

   df_num     city_x                           name_x  matched_df_num  \
0       1   Shenzhen                      Kingkey 100               2   
1       2   Shenzhen           Ping An Finance Centre               2   
2       2   Shenzhen           Ping An Finance Centre               3   
3       2  Guangzhou     Guangzhou CTF Finance Centre               3   
4       3   Shanghai  Shanghai World Financial Center               3   

  matched_city                     matched_name  similarity_ratio        id_x  \
0     Shenzhen           Ping An Finance Centre                51  1010667356   
1    Guangzhou     Guangzhou CTF Finance Centre                66         NaN   
2     Shanghai  Shanghai World Financial Center                59         NaN   
3     Shanghai  Shanghai World Financial Center                56         NaN   
4     Changsha            Changsha IFS Tower T1                57         NaN   

     id_y   id_x id_y city_y name_y    id_x     city_x  \
0     NaN    NaN  NaN    NaN    NaN  190010   Shenzhen   
1  190010    NaN  NaN    NaN    NaN  190012  Guangzhou   
2  190010    NaN  NaN    NaN    NaN     NaN        NaN   
3  190012    NaN  NaN    NaN    NaN     NaN        NaN   
4     NaN  ZY-13  NaN    NaN    NaN     NaN        NaN   

                         name_x   id_y    city_y  \
0        Ping An Finance Centre    NaN       NaN   
1  Guangzhou CTF Finance Centre    NaN       NaN   
2                           NaN  ZY-13  Shanghai   
3                           NaN  ZY-13  Shanghai   
4                           NaN  ZY-16  Changsha   

                            name_y  
0                              NaN  
1                              NaN  
2  Shanghai World Financial Center  
3  Shanghai World Financial Center  
4            Changsha IFS Tower T1  

您可以将concatmerge和左连接一起使用:

dff = pd.concat([df1, df2, df3])
print (dff)
           id       city                             name
0  1010667747     Suzhou                       Suzhou IFS
1  1010667356   Shenzhen                      Kingkey 100
2  1010667289      Wuhan                     Wuhan Center
0      190010   Shenzhen           Ping An Finance Centre
1      190012  Guangzhou     Guangzhou CTF Finance Centre
2      190015    Beijing                        China Zun
0       ZY-13   Shanghai  Shanghai World Financial Center
1       ZY-15  Hong Kong    International Commerce Centre
2       ZY-16   Changsha            Changsha IFS Tower T1

df = df.merge(dff,on = ['city', 'name'], how = 'left')
print (df)
   df_num       city                             name  matched_df_num  \
0       1   Shenzhen                      Kingkey 100               2   
1       2   Shenzhen           Ping An Finance Centre               2   
2       2   Shenzhen           Ping An Finance Centre               3   
3       2  Guangzhou     Guangzhou CTF Finance Centre               3   
4       3   Shanghai  Shanghai World Financial Center               3   

  matched_city                     matched_name  similarity_ratio          id  
0     Shenzhen           Ping An Finance Centre                51  1010667356  
1    Guangzhou     Guangzhou CTF Finance Centre                66      190010  
2     Shanghai  Shanghai World Financial Center                59      190010  
3     Shanghai  Shanghai World Financial Center                56      190012  
4     Changsha            Changsha IFS Tower T1                57       ZY-13

然后再次合并,为避免重复列使用rename

d = {'city':'matched_city','name':'matched_name', 'id':'matched_id'}
df5 = df.merge(dff.rename(columns=d),on = ['matched_city', 'matched_name'], how = 'left')
print (df5)
   df_num       city                             name  matched_df_num  \
0       1   Shenzhen                      Kingkey 100               2   
1       2   Shenzhen           Ping An Finance Centre               2   
2       2   Shenzhen           Ping An Finance Centre               3   
3       2  Guangzhou     Guangzhou CTF Finance Centre               3   
4       3   Shanghai  Shanghai World Financial Center               3   

  matched_city                     matched_name  similarity_ratio          id  \
0     Shenzhen           Ping An Finance Centre                51  1010667356   
1    Guangzhou     Guangzhou CTF Finance Centre                66      190010   
2     Shanghai  Shanghai World Financial Center                59      190010   
3     Shanghai  Shanghai World Financial Center                56      190012   
4     Changsha            Changsha IFS Tower T1                57       ZY-13   

  matched_id  
0     190010  
1     190012  
2      ZY-13  
3      ZY-13  
4      ZY-16   

编辑:您可以首先通过DataFrame.assign向每个 DataFrame 添加新值,然后也通过此列合并:

dff = pd.concat([df1.assign(df_num=1), df2.assign(df_num=2), df3.assign(df_num=3)])
df = df.merge(dff,on = ['city', 'name','df_num'], how = 'left')

d = {'city':'matched_city','name':'matched_name', 'id':'matched_id','df_num':'matched_df_num'}
df5 = (df.merge(dff.rename(columns=d), 
                on = ['matched_city', 'matched_name','matched_df_num'], 
                how = 'left'))
print (df5)
   df_num       city                             name  matched_df_num  \
0       1   Shenzhen                      Kingkey 100               2   
1       2   Shenzhen           Ping An Finance Centre               2   
2       2   Shenzhen           Ping An Finance Centre               3   
3       2  Guangzhou     Guangzhou CTF Finance Centre               3   
4       3   Shanghai  Shanghai World Financial Center               3   

  matched_city                     matched_name  similarity_ratio          id  \
0     Shenzhen           Ping An Finance Centre                51  1010667356   
1    Guangzhou     Guangzhou CTF Finance Centre                66      190010   
2     Shanghai  Shanghai World Financial Center                59      190010   
3     Shanghai  Shanghai World Financial Center                56      190012   
4     Changsha            Changsha IFS Tower T1                57       ZY-13   

  matched_id  
0     190010  
1     190012  
2      ZY-13  
3      ZY-13  
4      ZY-16  

试试这个,它可能会帮助你解决你的问题

    df1 = pd.DataFrame(np.array([
[1010667747, 'Suzhou', 'Suzhou IFS'],
[1010667356, 'Shenzhen', 'Kingkey 100'],
[1010667289, 'Wuhan', 'Wuhan Center']]),
columns=['id', 'city', 'name']
)
df2 = pd.DataFrame(np.array([
    [190010, 'Shenzhen', 'Ping An Finance Centre'],
    [190012, 'Guangzhou', 'Guangzhou CTF Finance Centre'],
    [190015, 'Beijing', 'China Zun']]),
    columns=['id', 'city', 'name']
)
df3 = pd.DataFrame(np.array([
    ['ZY-13', 'Shanghai', 'Shanghai World Financial Center'],
    ['ZY-15', 'Hong Kong', 'International Commerce Centre'],
    ['ZY-16', 'Changsha', 'Changsha IFS Tower T1']]),
    columns=['id', 'city', 'name']
)

df1['df_type'] = 1
df2['df_type'] = 2
df3['df_type'] = 3

df = pd.concat([df1,df2,df3])

df

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM