[英]looping a function in a data frame
My project: I'm creating an elo rating for tennis players, I have two different data frames. 我的项目:我正在为网球运动员创建elo等级,我有两个不同的数据框。
(1)A dataframe of players with their rating (2)A dataframe of matches ordered chronologically (1)评分的球员数据框(2)按时间顺序排列的比赛数据框
Working on the database of matches I would like to retrive the rating of both players and apply two functions (i already have them defined) predicted_result(rating1, rating2), and updated_rating(rating1, rating2). 在比赛数据库上工作时,我想检索两个球员的评分并应用两个函数(我已经定义了它们)predicted_result(rating1,rating2)和updated_rating(rating1,rating2)。 The first one gives me the expected result of the match given the ratings, the second one gives me the updated ratings. 第一个给了我给定评分的比赛预期结果,第二个给了我最新的评分。 Finally I need to record the updated ratings in the player database. 最后,我需要在玩家数据库中记录更新的评分。
I think that what I'm looking for is a loop that line by line: 我认为我正在寻找的是逐行循环:
Winner Loser
0 Nadal Federer
1 Djokovic Verdasco
2 Nadal Djokovic
3 Del Potro Verdasco
Player Rating
0 Nadal 2320
1 Djokovic 2280
2 Verdasco 2120
3 Federer 1890
4 Del potro 1542
I found the answer below that indicates how to roll the formula down, but I'm missing how to save the updated ratings on the player dataframe 我在下面找到了指示如何向下滚动公式的答案,但是我缺少如何在播放器数据框上保存更新的评分
The principal issue here seems to be the unhelpful format of your ratings
DataFrame. 这里的主要问题似乎是ratings
DataFrame的无用格式。 Since the purpose of the index is to make it easy to access rows by index value, if you make the player name the index the problem becomes much easier. 由于索引的目的是使按索引值访问行变得容易,因此,如果使播放器名称成为索引,则问题将变得更加容易。 Since I don't know how ratings are calculated I have assumed that winning increases the rating by one point and losing reduces the rating by one. 由于我不知道如何计算收视率,因此我假设获胜将收视率提高一分,而输掉则将评分降低一分。
First I make sure I'm using the same data as you :) 首先,请确保使用与您相同的数据:)
In [154]: ratings
Out[154]:
Player Rating
0 Nadal 2320
1 Djokovic 2280
2 Verdasco 2120
3 Federer 1890
4 Del Potro 1542
In [155]: results
Out[155]:
Winner Loser
0 Nadal Federer
1 Djokovic Verdasco
2 Nadal Djokovic
3 Del Potro Verdasco
Next I make a copy of the ratings table with the Player as the index. 接下来,我将“播放器”作为索引复制评级表。
In [156]: ir = ratings.set_index(ratings["Player"].values)
I chose to then remove the original "Player" column, since it is now redundant. 我选择删除原始的“ Player”列,因为它现在是多余的。 YMMV. 因人而异。
In [157]: del ir["Player"]
In [158]: ir
Out[158]:
Rating
Nadal 2320
Djokovic 2280
Verdasco 2120
Federer 1890
Del Potro 1542
You can iterate over each column in the results
table: 您可以遍历results
表中的每一列:
In [159]: for row in results["Winner"]:
.....: print(row)
.....:
Nadal
Djokovic
Nadal
Del Potro
So it's now relatively simply to update your ratings: 因此,现在相对简单地更新您的评分:
In [160]: for row in results["Winner"]:
.....: ir['Rating'][row] += 1
.....:
In [161]: for row in results["Loser"]:
.....: ir['Rating'][row] -= 1
.....:
In [162]: ir
Out[162]:
Rating
Nadal 2322
Djokovic 2280
Verdasco 2118
Federer 1889
Del Potro 1543
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.