简体   繁体   English

将两个熊猫数据框与条件合并

[英]Merge two pandas dataframe with conditions

I have two dataframes that I need to merge on complex conditions. 我有两个需要在复杂条件下合并的数据框。 Here the two dataframe: 这里是两个数据框:

     dock_id     dock_name               avail_bikes    avail_docks  \
0    3082        Hope St & Union Ave     8              16   
1    468         Broadway & W 55 St      0              59   
2    407         Henry St & Poplar St    22             15   
3    3016        Kent Ave & N 7 St       29             16   

    status_key   datehour             ...    visi  vism   wdird     wdire  \
0   1            2016-06-01 19:25:00  ...    NaN   NaN    NaN       NaN   
1   1            2016-06-01 19:25:00  ...    NaN   NaN    NaN       NaN   
2   1            2016-06-01 19:25:00  ...    NaN   NaN    NaN       NaN   
3   1            2016-06-01 19:25:00  ...    NaN   NaN    NaN       NaN    

     tot_docks    _lat               _long         in_service  
0    25           40.711674          -73.951413    1   
1    59           40.765265          -73.981923    1   
2    37           40.700469          -73.991454    1   
3    47           40.720368          -73.961651    1   

.... ....

        Start Date/Time         End Date/Time            Event Agency  \
0       01/01/2016 12:00:00 AM  01/01/2016 02:00:00 AM   Parks Department   
1       01/02/2016 12:00:00 AM  01/02/2016 02:00:00 AM   Parks Department   
2       01/03/2016 12:00:00 AM  01/03/2016 02:00:00 AM   Parks Department   
3       01/04/2016 12:00:00 AM  01/04/2016 02:00:00 AM   Parks Department   

        latitude   longitude  
0       40.782865  -73.965355  
1       40.782865  -73.965355  
2       40.782865  -73.965355  
3       40.782865  -73.965355  
4       40.782865  -73.965355 

I would like to join them with the conditions: 我想加入他们的条件:

Start Date/Time <= datehour <= End Date/Time and distance(_lat,_lon,latitude,longitude) < d

I know it is possible to merge data and then apply a filter on it to do it but the datasets are too big (10263241 rows and 401080 rows). 我知道可以合并数据,然后对其应用过滤器,但是数据集太大(10263241行和401080行)。 So I don't think this method would work in a reasonable time. 因此,我认为这种方法不会在合理的时间内起作用。

Have you any idea how could we solve this problem? 您知道如何解决这个问题吗?

Thanks for your answers! 感谢您的回答!

import pandas as pd ... new_frame=pd.merge(dataframe1,dataframe2,on condition) 将熊猫作为pd导入... new_frame = pd.merge(dataframe1,dataframe2,有条件)

in case of more advanced merge we can specify the column names as well dataframe[['column1','column2',...]] 如果是更高级的合并,我们可以指定列名以及dataframe [['column1','column2',...]]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM