简体   繁体   English

使用Python Pandas将具有范围条件的两个表联接起来

[英]Join two tables with a range criteria using Python Pandas

I have a similar problem to this simplified version: 我有一个与此简化版本类似的问题:

The experiment result is saved in the excel sheet, and I processed the data using Python Pandas and converted them to DataFrames. 实验结果保存在excel工作表中,我使用Python Pandas处理了数据并将其转换为DataFrames。

Two tables given below: Table_Race save in DataFrame race Table_standard save in DataFrame std 下面给出了两个表:Table_Race保存在DataFrame race中Table_standard保存在DataFrame std中

>>> data = [["Gold+",1,30,35],["Silver+",1,25,30],["Bronze+",1,20,25],["Gold",2,20,25],["Silver",2,15,20],["Bronze",2,10,15]]
>>> std = pd.DataFrame(data,columns=['Title','League','Start','End'])
>>> std
     Title  League  Start  End
0    Gold+       1     30   35
1  Silver+       1     25   30
2  Bronze+       1     20   25
3     Gold       2     20   25
4   Silver       2     15   20
5   Bronze       2     10   15
>>> data = [["John",1,26],["Ryan",1,33],["Mike",1,9],["Jo",2,15],["Riko",2,21],["Kiven",2,13]]
>>> race = pd.DataFrame(data,columns=['Name','League','Distance'])
>>> race
    Name  League  Distance
0   John       1        26
1   Ryan       1        33
2   Mike       1         9
3     Jo       2        21
4   Riko       2        15
5  Kiven       2        13
>>> 

I would like to check the distance for each player and get their title according to the standards: 我想检查每个球员的距离,并根据标准获得他们的头衔:

    Title <= distance in [start, end) and need to match league

For example: Jo from league 2 and has distance 15 which is in between [15,20). 例如:来自联赛2的乔,距离15在[15,20]之间。 Note that it's not [10,15), hence he get title 'Silver' 请注意,它不是[10,15),因此他获得了标题“银”

The expected result as follows: 预期结果如下:

    Name    League  Distance    Title
    John    1       26          Silver+
    Ryan    1       33          Gold+
    Mike    1       9           N/A
    Jo      2       21          Gold
    Riko    2       15          Silver
    Kiven   2       13          Bronze

I can achieved this using two loops which basically get each distance from Table_race and search for (l, d) from each row of race's (League, Distance) 我可以使用两个循环来实现此目的,这两个循环基本上从Table_race获取每个距离,并从种族的每一行(联赛,距离)中搜索(l,d)

Looking for condition: 寻找条件:

    l == League && d >= Start && d < End

But this method is O(N^2) which is too slow, as my data can easily go over 100,000 which takes hours to finish. 但是此方法的速度为O(N ^ 2),它太慢了,因为我的数据很容易超过100,000,这需要数小时才能完成。

Any better solutions? 有更好的解决方案吗?

Still working on solution but here is something to start : 仍在研究解决方案,但这里是一些开始:

>>> data = [["Gold+",1,30,35],["Silver+",1,25,30],["Bronze+",1,20,25],["Gold",2,20,25],["Silver",2,15,20],["Bronze",2,10,15]]
>>> std = pd.DataFrame(data,columns=['Title','League','Start','End'])
>>> std
     Title  League  Start  End
0    Gold+       1     30   35
1  Silver+       1     25   30
2  Bronze+       1     20   25
3     Gold       2     20   25
4   Silver       2     15   20
5   Bronze       2     10   15

>>> data = [["John",1,26],["Ryan",1,33],["Mike",1,9],["Jo",2,21],["Riko",2,15],["Kiven",2,13]]
>>> race = pd.DataFrame(data,columns=['Name','League','Distance'])
>>> race
    Name  League  Distance
0   John       1        26
1   Ryan       1        33
2   Mike       1         9
3     Jo       2        21
4   Riko       2        15
5  Kiven       2        13
>>> result=pd.merge(race,std,on='League')
>>> result = result[(result.Distance >= result.Start)&(result.Distance < result.End)][["Name","League","Distance","Title"]]
>>> result
     Name  League  Distance    Title
1    John       1        26  Silver+
3    Ryan       1        33    Gold+
9      Jo       2        21     Gold
13   Riko       2        15   Silver
17  Kiven       2        13   Bronze 

Merge and Multiple conditions links for their tutorials and drawbacks. 合并多个条件链接,以了解其教程和缺点。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM