简体   繁体   中英

Merging two varying sized data frames on 2 columns in Python

I am looking to merge 2 dataframes in python, df1 and df2, on 2 columns, Site and Building, with different row quantities as a way of attaining a "safe" value for each generator value in df1. Below is a demonstration code, though I've created the dataframes in the below example (which appears to work), the data for each table in the actual issue comes from SQL queries, which leads me to believe the merge is having issues due to data type.

import pandas as pd

df = {'Site': ['Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Greece','Greece','Greece','Greece','Greece','Greece'],
        'Building' : ['X1','X1','X1','X1','X1','X1','X2','X2','X2','X2','X2','X2','X3','X3','X3','X3','X3','X3','X4','X4','X4','X4','X4',   'X4','X5','X5','X5','X5','X5','X5','X1','X1',   'X1','X1',  'X1','X1','X2','X2','X2','X2','X2','X2','X3','X3','X3','X3','X3','X3','X1', 'X1','X1',  'X1','X1',  'X1'],
        'Generator' : ['DE','NDE',  'GBX1','GBX2','GBX3','GBX4','DE','NDE','GBX1','GBX2','GBX3','GBX4','DE','NDE',  'GBX1','GBX2','GBX3','GBX4','DE','NDE','GBX1','GBX2','GBX3','GBX4','DE','NDE','GBX1','GBX2','GBX3','GBX4',  'DE','NDE','GBX1','GBX2','GBX3','GBX4','DE',    'NDE','GBX1','GBX2','GBX3','GBX4',  'DE','NDE','GBX1','GBX2','GBX3','GBX4','DE','NDE','GBX1','GBX2','GBX3','GBX4']}

df1 = pd.DataFrame(df1, columns = ['Site', 'Building', 'Generator'])


df15 = {'Building' : ['X1','X2','X3','X4','X5','X1','X2','X3','X1'],
        'Site': ['Belgium','Belgium','Belgium','Belgium','Belgium','Holland','Holland','Holland','Greece'],
        'Safe' : [1,    1,  1,  1,  1,  0,  1,  1,  0]}


df2 = pd.DataFrame(df15, columns = ['Site', 'Building', 'Safe'])


df3 = df1.merge(df2, how = 'left', on = ['Site', 'Building'], indicator = True)

Desired outcome

I've also tried changing the data types of each to string as per pandas - Merging on string columns not working (bug?) ,

df1['Site'] = df1['Site'].astype('str')
df1['Building']=df1['Building'].astype('str')
df1['Site'] = df1['Site'].astype('str')
df1['Building']=df1['Building'].astype('str')

and also the step that mentions checking encoding as per below, but all seem to match ie. there are no visible dependencies in content;

df1['Building'] = df1['Building'].str.encode('UTF-8')
df1['Site'] = df1['Site'].str.encode('UTF-8')

Datatypes:

df2.datatypes:

    Site                           object
    Building                       object
    Safe                           object
    dtype: object


    df1.datatypes:


    Building      object
    Site          object
    Generator     object
    dtype:        object

I've tried the below code:

df3 = df1.merge(df2, left_on = ['Site', 'Building'], right_on = ['Site', 'Building'], how = 'left', indicator = 'indicator')

or:

df3 = df1.merge(df2, on = ['Site', 'Building'], how = 'left', indicator = 'indicator')

but the outcome ends up with only the data from left ie outcome 1.

Undesired outcome 1

I've tried outer join as per below, which yields outcome 2:

df3 = df1.merge(df2, on = ['Site', 'Building'], how = 'outer', indicator = 'indicator')

Undesired Outcome 2

Apologies for my relative ignorance with respect to pandas.

I noticed a small error in the code you shared.

df1 = pd.DataFrame(df1, columns = ['Site', 'Building', 'Generator']) should be df1 = pd.DataFrame(df, columns = ['Site', 'Building', 'Generator']) . The variable that you should be passing to pd.Dataframe should be df and not df1 .

After this step, simply doing a merge using the dataframes gives the desired outcome

pd.merge(df1,df2, on=['Building','Site'])

Output looks like :

       Site Building Generator  Safe
0   Belgium       X1        DE     1
1   Belgium       X1       NDE     1
2   Belgium       X1      GBX1     1
3   Belgium       X1      GBX2     1
4   Belgium       X1      GBX3     1
5   Belgium       X1      GBX4     1
6   Belgium       X2        DE     1
7   Belgium       X2       NDE     1
8   Belgium       X2      GBX1     1
9   Belgium       X2      GBX2     1
10  Belgium       X2      GBX3     1
11  Belgium       X2      GBX4     1
12  Belgium       X3        DE     1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM