简体   繁体   中英

Pandas merge filling new dataframe with null values

I am trying to merge two dataframes:

  • The first dataframe, control , is filled with INTEGERS/STRINGS
  • The left dataframe, together, is filled with INTEGERS/LISTS .

When I use the pandas merge() function, the new dataframe fills the right dataframe with NaN, rather than the lists

final_dataset = pd.merge(control, together, on="zip_code", how="left")

I expect a new merged dataframe with the values from the two original dataframes. Instead, in the new dataframe, all of the values from the "control" dataframe are correct but all of lists from the "together" dataframe are NaN

Here is some sample data:

control                                       together
-------------------------------              -------------------------------
payment             zip_code                   age                  zip_code
   Rent                 94053                    [25, 64, 24]         12583
   Mortgage             47283                    [78. 39, 35]         47283
   Rent                 25769                    [82, 33, 19]         25769

Here is what the final dataset looks like:

final_dataset
-----------------------------------------------------------
zip_code             payment                 age                  
47283                  Mortgage               NaN                 
25769                  Rent                   NaN                                

I think you have a few things going on here. When you say the left dataframe I assume that you mean that it should be the left joined right? You don't mean that 'together' is on the left side in the sample?

I think it is safe to assume that your zip_code in 'together' is a string not an 'int'. You are getting the NaN's because they do not match in the 2 dataframes example 47283 does not equal '47283'.

Also, if it's a left join you want with together being on the left you should have 1 NaN on payment since you only have 2 zip_codes that match if they are the same datatype.

Here is how I would recommend doing it if you want control on the left (I think you do):

control = pd.DataFrame({
    'payment':['Rent','Mortgage','Rent'],
    'zip_code':[94053,47283,25769]
})
together = pd.DataFrame({
    'age':[[25,64,24],[78, 39,35],[82,33,19]],
    'zip_code':[12583,47283,25769]
})

control.merge(together,on='zip_code',how='left')

This will give you the following results:

    payment  zip_code           age
0      Rent     94053           NaN
1  Mortgage     47283  [78, 39, 35]
2      Rent     25769  [82, 33, 19]

As you can see you have 1 NaN in age as 94053 is not in the 'together' DataFrame.

This could happens if the zip_code columns have different types for each dataframe, may be one of them is int64, and the other is object and for eg:

 a = pd.DataFrame([
    {"colA": 1, "key": "1"},
    {"colA": 2, "key": "2"},
    {"colA": 3, "key": "3"}
])

b = pd.DataFrame([
    {"colB": [25, 64, 24], "key": 1},
    {"colB": [25, 64, 24], "key": 2},
    {"colB": [25, 64, 24], "key": 4}
])

if you merge this two dataframe, you'll get

res = pd.merge(a, b, on="key", how='left')



   colA key colB
0   1   1   NaN
1   2   2   NaN
2   3   3   NaN

So you need to make sur that zip_code has the same type in the two dataframes.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM