简体   繁体   English

PageRank理论-使用igraph进行R中的无辅助目标评分

[英]PageRank Theory — Unassisted Goal Scoring in R with igraph

I'm trying to analyze goal-scoring networks in hockey. 我正在尝试分析曲棍球的进球网络。 I have data for the player who scored the goal and the player who assisted on that goal. 我有进球的球员和协助进球的球员的数据。 My issue is that some goals do not have an assist, so I'm not sure what I should do in those situations. 我的问题是某些目标没有帮助,因此我不确定在这种情况下应该做什么。

So, an example for my data looks like this: 因此,我的数据示例如下所示:

scorer <- c("Lidstrom", "Yzerman", "Fedorov", "Yzerman", "Shanahan")
assister <- c("", "Lidstrom", "Yzerman", "Shanahan", "Lidstrom")

mydata <- data.frame(scorer, assister)

And the output is: 输出为:

    scorer assister
1 Lidstrom         
2  Yzerman Lidstrom
3  Fedorov  Yzerman
4  Yzerman Shanahan
5 Shanahan Lidstrom

When I'm dealing with unassisted goals, does it make sense to act as if the assist goes to the scorer? 当我处理无助目标时,像助攻得分手那样行动是否有意义?

EX: EX:

    scorer assister
1 Lidstrom Lidstrom        
2  Yzerman Lidstrom
3  Fedorov  Yzerman
4  Yzerman Shanahan
5 Shanahan Lidstrom

Or does it make sense to create a new name "unassisted" for unassisted goals? 还是为无助目标创建一个“无助”的新名称有意义?

EX: EX:

    scorer assister
1 Lidstrom UNASSISTED       
2  Yzerman Lidstrom
3  Fedorov  Yzerman
4  Yzerman Shanahan
5 Shanahan Lidstrom

Here's the rest of my code for the PageRank, assuming that something is filled in for the blank assister space: 这是我的PageRank代码的其余部分,假设在空白辅助空间中已填充一些内容:

library(igraph)
library(dplyr)

my_network <- mydata %>%
  as.matrix() %>%
  graph.edgelist(directed = TRUE)

page_rank(my_network, directed = TRUE)$vector

I can't just remove goals that are unassisted, so I'm trying to come up with some solution that doesn't defy any major graph theory principles (of which I'm not knowledgeable). 我不能仅仅删除那些无助的目标,所以我试图提出一种不违背任何主要图论原理(我不了解)的解决方案。 Any ideas? 有任何想法吗?

I agree with the suggestion of @emilliman5 outlined in the comments: for unassisted goals, just make an edge from the scorer to itself. 我同意评论中概述的@ emilliman5的建议:对于无助的目标,只需从得分手到自己就占优势。 Then use PageRank for finding the most influential players. 然后使用PageRank查找最有影响力的玩家。 Actually, PageRank can be a particularly good choice here because the principles underlying the PageRank score bear some similarity to what is going on in a "real" hockey match. 实际上,PageRank在这里可能是一个特别好的选择,因为PageRank得分的基本原理与“真正的”冰球比赛中发生的事情有些相似。

Let me elaborate on this a bit. 让我详细说明一下。 PageRank was originally invented for modeling the behaviour of a randomly chosen Internet user browsing the pages on the web. PageRank最初是为了对浏览网络页面的随机选择的Internet用户的行为进行建模而发明的。 In each time step, the user can choose to follow a link on the web page currently being viewed, or surf to another, unrelated page, chosen uniformly from the set of all pages on the Internet. 在每个时间步骤中,用户可以选择跟随当前正在查看的网页上的链接,也可以浏览到另一个不相关的页面,这些页面是从Internet上所有页面的集合中统一选择的。 There is a fixed probability value that decides whether the user is going to follow a link (typically 0.85) or the user is going to "teleport" to a randomly chosen page (typically 0.15). 存在一个固定的概率值,该值决定用户是要跟随链接(通常为0.85)还是用户要“传送”到随机选择的页面(通常为0.15)。 The idea behind PageRank is that the most important pages are where the user is likely to spend a lot of time when following the rules above. PageRank背后的想法是,最重要的页面是用户遵循上述规则时可能会花费大量时间的位置。 The behaviour of the user is essentially a random walk over the set of webpages. 用户的行为本质上是随机浏览该网页集。

Now, in a hockey game, the "user" is the hockey puck that is being passed from player to player. 现在,在曲棍球游戏中,“用户”是指冰球在玩家之间传递。 At each pass, the puck is either passed from one player to another, or a goal is scored, or the puck is accidentally passed to the opposing team. 每次传球时,冰球要么从一个球员传给另一位球员,要么打进一球,或者冰球被意外传给对方的球队。 In the latter two cases, the puck ends up at the opposing team, and eventually it is returned to the first team at a randomly chosen player. 在后两种情况下,冰球最终出现在对方球队中,并最终以随机选择的球员将其送回第一队。 (This is a first approximation; if you want to go deeper, you could keep on "tracking" the puck for the opposing team as well). (这是第一个近似值;如果您想更深入,那么您也可以继续“追踪”对方球队的冰球)。 I think you can start seeing the similarities here. 我认为您可以在这里开始看到相似之处。 The assister-to-scorer network that you have captures a fragment of this, namely the last pass before each goal. 您所拥有的助手到得分者网络捕获了其中的一部分,即每个目标之前的最后一关。 From this point of view, I think it totally makes sense to think about unassisted goals as events where the player passed to himself before scoring. 从这个角度来看,我认为将无辅助进球视为球员得分前传给自己的事件完全有意义。

Of course you would have a much better understanding of the team dynamics if your dataset contained all the passes, not only the ones that resulted in a goal. 当然,如果您的数据集包含所有通行证,而不仅仅是导致目标的通行证,您将对团队的动态有更好的了解。 In fact, in that case, you could add an additional node called "GOAL" to your network, draw edges from scorers to the "GOAL" node, and then calculate the so-called personalized PageRank vector for the "GOAL" node, which would give you the most influential nodes from which the "GOAL" node is the easiest to reach. 实际上,在这种情况下,您可以向网络中添加一个名为“ GOAL”的附加节点,将记分器的边缘绘制到“ GOAL”节点,然后为“ GOAL”节点计算所谓的个性化PageRank向量,会为您提供最有影响力的节点,而从中最容易到达“ GOAL”节点。 But this is more like a research question from this point onwards, and it is probably not a good fit for further discussion on Stack Overflow. 但是从现在开始,这更像是一个研究问题,可能不适合进一步讨论Stack Overflow。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM