I am trying to merge two dataframes:
control
, is filled with INTEGERS/STRINGS When I use the pandas merge()
function, the new dataframe fills the right dataframe with NaN, rather than the lists
final_dataset = pd.merge(control, together, on="zip_code", how="left")
I expect a new merged dataframe with the values from the two original dataframes. Instead, in the new dataframe, all of the values from the "control" dataframe are correct but all of lists from the "together" dataframe are NaN
Here is some sample data:
control together
------------------------------- -------------------------------
payment zip_code age zip_code
Rent 94053 [25, 64, 24] 12583
Mortgage 47283 [78. 39, 35] 47283
Rent 25769 [82, 33, 19] 25769
Here is what the final dataset looks like:
final_dataset
-----------------------------------------------------------
zip_code payment age
47283 Mortgage NaN
25769 Rent NaN
I think you have a few things going on here. When you say the left dataframe I assume that you mean that it should be the left joined right? You don't mean that 'together' is on the left side in the sample?
I think it is safe to assume that your zip_code in 'together' is a string not an 'int'. You are getting the NaN's because they do not match in the 2 dataframes example 47283 does not equal '47283'.
Also, if it's a left join you want with together being on the left you should have 1 NaN on payment since you only have 2 zip_codes that match if they are the same datatype.
Here is how I would recommend doing it if you want control on the left (I think you do):
control = pd.DataFrame({
'payment':['Rent','Mortgage','Rent'],
'zip_code':[94053,47283,25769]
})
together = pd.DataFrame({
'age':[[25,64,24],[78, 39,35],[82,33,19]],
'zip_code':[12583,47283,25769]
})
control.merge(together,on='zip_code',how='left')
This will give you the following results:
payment zip_code age
0 Rent 94053 NaN
1 Mortgage 47283 [78, 39, 35]
2 Rent 25769 [82, 33, 19]
As you can see you have 1 NaN in age as 94053 is not in the 'together' DataFrame.
This could happens if the zip_code columns have different types for each dataframe, may be one of them is int64, and the other is object and for eg:
a = pd.DataFrame([
{"colA": 1, "key": "1"},
{"colA": 2, "key": "2"},
{"colA": 3, "key": "3"}
])
b = pd.DataFrame([
{"colB": [25, 64, 24], "key": 1},
{"colB": [25, 64, 24], "key": 2},
{"colB": [25, 64, 24], "key": 4}
])
if you merge this two dataframe, you'll get
res = pd.merge(a, b, on="key", how='left')
colA key colB
0 1 1 NaN
1 2 2 NaN
2 3 3 NaN
So you need to make sur that zip_code has the same type in the two dataframes.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.