简体   繁体   English

在数据框中循环功能

[英]looping a function in a data frame

My project: I'm creating an elo rating for tennis players, I have two different data frames. 我的项目:我正在为网球运动员创建elo等级,我有两个不同的数据框。

(1)A dataframe of players with their rating (2)A dataframe of matches ordered chronologically (1)评分的球员数据框(2)按时间顺序排列的比赛数据框

Working on the database of matches I would like to retrive the rating of both players and apply two functions (i already have them defined) predicted_result(rating1, rating2), and updated_rating(rating1, rating2). 在比赛数据库上工作时,我想检索两个球员的评分并应用两个函数(我已经定义了它们)predicted_result(rating1,rating2)和updated_rating(rating1,rating2)。 The first one gives me the expected result of the match given the ratings, the second one gives me the updated ratings. 第一个给了我给定评分的比赛预期结果,第二个给了我最新的评分。 Finally I need to record the updated ratings in the player database. 最后,我需要在玩家数据库中记录更新的评分。

I think that what I'm looking for is a loop that line by line: 我认为我正在寻找的是逐行循环:

  • on the first line of the match dataframe retrives the ratings from 匹配数据帧第一行上的评级从
    the player database 玩家数据库
  • runs both functions 运行两个功能
  • replaces the old rating with the updated rating in the player database. 在播放器数据库中用更新的评分替换旧的评分。

Match Dataframe 匹配数据框

    Winner    Loser   
0   Nadal     Federer   
1   Djokovic  Verdasco   
2   Nadal     Djokovic  
3   Del Potro Verdasco 

Player Dataframe 播放器数据框

    Player  Rating   
0   Nadal     2320   
1   Djokovic  2280   
2   Verdasco  2120
3   Federer   1890     
4   Del potro 1542 

I found the answer below that indicates how to roll the formula down, but I'm missing how to save the updated ratings on the player dataframe 我在下面找到了指示如何向下滚动公式的答案,但是我缺少如何在播放器数据框上保存更新的评分

Rolling a function on a data frame 在数据框上滚动功能

The principal issue here seems to be the unhelpful format of your ratings DataFrame. 这里的主要问题似乎是ratings DataFrame的无用格式。 Since the purpose of the index is to make it easy to access rows by index value, if you make the player name the index the problem becomes much easier. 由于索引的目的是使按索引值访问行变得容易,因此,如果使播放器名称成为索引,则问题将变得更加容易。 Since I don't know how ratings are calculated I have assumed that winning increases the rating by one point and losing reduces the rating by one. 由于我不知道如何计算收视率,因此我假设获胜将收视率提高一分,而输掉则将评分降低一分。

First I make sure I'm using the same data as you :) 首先,请确保使用与您相同的数据:)

In [154]: ratings
Out[154]:
      Player  Rating
0      Nadal    2320
1   Djokovic    2280
2   Verdasco    2120
3    Federer    1890
4  Del Potro    1542

In [155]: results
Out[155]:
      Winner     Loser
0      Nadal   Federer
1   Djokovic  Verdasco
2      Nadal  Djokovic
3  Del Potro  Verdasco

Next I make a copy of the ratings table with the Player as the index. 接下来,我将“播放器”作为索引复制评级表。

In [156]: ir = ratings.set_index(ratings["Player"].values)

I chose to then remove the original "Player" column, since it is now redundant. 我选择删除原始的“ Player”列,因为它现在是多余的。 YMMV. 因人而异。

In [157]: del ir["Player"]

In [158]: ir
Out[158]:
           Rating
Nadal        2320
Djokovic     2280
Verdasco     2120
Federer      1890
Del Potro    1542

You can iterate over each column in the results table: 您可以遍历results表中的每一列:

In [159]: for row in results["Winner"]:
   .....:         print(row)
   .....:
Nadal
Djokovic
Nadal
Del Potro

So it's now relatively simply to update your ratings: 因此,现在相对简单地更新您的评分:

In [160]: for row in results["Winner"]:
   .....:         ir['Rating'][row] += 1
   .....:

In [161]: for row in results["Loser"]:
   .....:         ir['Rating'][row] -= 1
   .....:

In [162]: ir
Out[162]:
           Rating
Nadal        2322
Djokovic     2280
Verdasco     2118
Federer      1889
Del Potro    1543

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM