简体   繁体   中英

Merging two DataFrames in Pandas results in NaNs in the new merged DF

I have two DataFrames in Pandas I want to join together (I think merge), and when I do, the resultant DataFrame has all NaN for the right part of the new DataFrame. Here's a simplified schematic:

DF_Left

     station_name     trips    date_zip
0    Mountain View     100   95113 2013-08-29
1    San Francisco     190   95113 2012-04-12
2    San Jose          109   94107 2013-09-01

DF_Right

      max_temperature     wind_speed   date_zip
0      79                   2       95113 2013-08-29
1      67                   3       95113 2012-04-12
2      64                   1       94107 2013-09-01

There's about 40K rows on the left, and 1500 on the right. What I want to do is merge the two so that the DF_Right is added to the DF_Left based on the date_zip column. So what I really want is

DF_Correct

     station_name     trips    date_zip         max_temperature   wind_speed
0    Mountain View     100   95113 2013-08-29   79                     2                          
1    San Francisco     190   95113 2012-04-12   67                     3                     
2    San Jose          109   94107 2013-09-01   64                     1

When I do

DF_Correct = pd.merge(DF_Left, DF_Right,   left_on=['date_zip'], right_on = ['date_zip' ], how='left')

I get what I wanted, except all of the weather columns are now NaNs. I'm not sure about the terminology here, so I think merge is what I want, but I'm not sure what's happening to my data.

Please inspect the data to make sure the data/types are correct. Find below the code, tried with your sample. Test ran well

import pandas as pd
df1 = pd.DataFrame({'station_name': ['Mountain View','San Francisco','San Jose','San Jose'],
                   'trips': [100,190,109,110],
                   'date_zip': ['95113 2013-08-29','95113 2012-04-12','94107 2013-09-01','94107 2013-09-02']})
df2 = pd.DataFrame({'wind_speed': [2,3,1],
                   'max_temperature': [79,67,64],
                   'date_zip': ['95113 2013-08-29','95113 2012-04-12','94107 2013-09-01']})

DF_Correct = pd.merge(df1, df2, on='date_zip', how='left')

到目前为止,我从问题中理解,下面的代码应该给出所需的答案。

DF_Correct = pd.merge(DF_Right, DF_Left ,  how='left', on='date_zip')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM