简体   繁体   English

考虑剩余时间的泊松分布

[英]Poission Distribution considering time left

I want to calculate the remaining probabilities for each result in a football game at n minute.我想计算在n分钟的足球比赛中每个结果的剩余概率。

In this case I have expected goals for home team of 2.69 and away team 1.12 at 70 minute for a current result of 2-1在这种情况下,我预计70分钟时主队的进球数为2.69 ,客队的进球数为1.12 ,目前的结果是2-1

Code代码

from scipy.stats import poisson
from itertools import product
import numpy as np
import pandas as pd

xgh = 2.69
xga = 1.12

minute = 70

hg, ag = 2,1
phs=[]
pas=[]
for i, l in zip(range(0, 6), range(0, 6)):
  ph = poisson.pmf(mu=xgh, k=i, loc=hg)
  phs.append(ph)
  pa = poisson.pmf(mu=xga, k=l, loc=ag)
  pas.append(pa)

prod_table = np.array([(i*j) for i, j in product(phs, pas)])
prod_table.shape = (6, 6)

prob_df = pd.DataFrame(prod_table, index=range(0,6), columns=range(0, 6))

This return a probability of 2-1 final result for 2.21% that is pretty low I expect an high probability considering only 20 minutes left这返回2-1最终结果的概率为2.21% ,这是相当低的,考虑到只剩下20分钟,我预计概率很高

Math considerations数学注意事项

Poisson distribution is the probability that an event occurs k times in a given time frame, knowing that, on average, it is supposed to occur μ times in this same time frame.泊松分布是一个事件在给定时间范围内发生 k 次的概率,已知该事件平均应该在同一时间范围内发生 μ 次。

The postulate of Poisson distribution is that events are totally independent.泊松分布的假设是事件是完全独立的。 So how many times it has already occurred is meaningless.所以它已经发生了多少次是没有意义的。 And that they are uniformly distributed (If I may use this confusing word, since this is not a uniform distribution).并且它们是均匀分布的(如果我可以使用这个令人困惑的词,因为这不是均匀分布)。

Most of the time, Poisson's usage is to compute probability of occurrence of k events in a timeframe T, when we know that μ events occur on average in a timeframe τ (difference with 1st sentence being that T and τ are not the same).大多数时候,泊松的用途是计算 k 个事件在时间帧 T 内发生的概率,当我们知道 μ 个事件平均发生在时间帧 τ 内(与第 1 句话的区别在于 T 和 τ 不相同)。

But that is the easy part: since evens are uniformly distributed, if μ events occurs on averate in a time frame τ, then μ×T/τ events shoud occur, on average, in a time frame T (understand: if we were to experiment millions of time frame T, then on average, there should be μT/τ events in each of them).但这是容易的部分:因为事件是均匀分布的,如果 μ 事件平均发生在时间范围 τ 内,那么 μ×T/τ 事件平均应该发生在时间范围 T 内(理解:如果我们要实验数百万个时间帧 T,那么平均而言,每个时间帧中应该有 μT/τ 事件)。

So, to compute the probability that event occurs k times in time frame T, knowing that it occurs μ times in time frame τ, you just have to reply to question "how many times event occurs k times in time frame T, knowing that it occurs μT/τ times in that time time frame".因此,要计算事件在时间帧 T 中发生 k 次的概率,知道它在时间帧 τ 中发生 μ 次,您只需回答问题“事件在时间帧 T 中发生了多少次 k 次,知道它在该时间范围内发生 μT/τ 次”。 Which is the question Poisson can answer.这是泊松可以回答的问题。

In python, that answer is poisson.pmf(k, μT/τ) .在 python 中,答案是poisson.pmf(k, μT/τ)

In your case, you know μ, the number of goals expected in a 90 minutes time frame.在您的情况下,您知道 μ,即 90 分钟时间范围内预期的目标数量。 You know that the time frame left to score is 20 minutes.你知道剩下的得分时间是 20 分钟。 If 2.69 goals are expected in a time frame of 90 minutes then 0.5978 goals are expected in a time frame of 20 minutes (at least, that is Poisson postulates that things work that way).如果在 90 分钟的时间范围内预计有 2.69 个进球,那么在 20 分钟的时间范围内预计会有 0.5978 个进球(至少,泊松假设事情是这样进行的)。 Therefore, the probability for that team to score no other goal in that timeframe is poisson.pmf(0, 0.5978) .因此,该团队在该时间范围内没有进球的概率是poisson.pmf(0, 0.5978) Or, using your keyword style poisson.pmf(mu=0.5978, k=0) .或者,使用您的关键字样式poisson.pmf(mu=0.5978, k=0) Or using loc , to have the total amount of goals poisson.pmf(mu=0.5978, k=2, loc=2) (but that is just cosmetic. Having a loc parameter just replace k by k-loc )或者使用loc ,使目标poisson.pmf(mu=0.5978, k=2, loc=2) (但这只是装饰性的。有一个 loc 参数只需用k-loc替换 k )

tl;dr solution tl;博士解决方案

So, long story short, you just need to scale down xgh and xga so that they reflect the expected number of goals in the remaining time.因此,长话短说,您只需要按比例缩小xghxga ,以便它们反映剩余时间内的预期目标数量。

for i, l in zip(range(0, 6), range(0, 6)):
  ph = poisson.pmf(mu=xgh*(90-minute)/90, k=i, loc=hg)
  phs.append(ph)
  pa = poisson.pmf(mu=xga*(90-minute)/90, k=l, loc=ag)
  pas.append(pa)

Other comments其他的建议

zip zip

While at it, and since there is a python tag, some comments on the code在此期间,由于有一个python标签,对代码进行了一些评论

for i, l in zip(range(0, 6), range(0, 6)):
    print(i,l)

produces产生

0 0
1 1
2 2
3 3
4 4
5 5

So it is quite strange not to use a single variable.所以不使用单个变量是很奇怪的。 Especially if you consider that there is no way you could use different ranges ( zip must be used with iterables of the same length. And we don't see under which circumstances, we would need, for example, i to grow from 0 to 5, while l would grow from 0 to 10)特别是如果你认为你无法使用不同的范围( zip必须与相同长度的迭代一起使用。我们不知道在什么情况下,我们需要,例如, i 从 0 增长到 5 , 而 l 将从 0 增长到 10)

So just所以就

for k in range(0, 6):
  ph = poisson.pmf(mu=xgh*(90-minute)/90, k=k, loc=hg)
  phs.append(ph)
  pa = poisson.pmf(mu=xga*(90-minute)/90, k=k, loc=ag)
  pas.append(pa)

I surmise, especially because of what is the object of the next remark, that once upon a time, there was a product instead of that zip , before you realized that this was computing several time the same exact pmf .我推测,特别是因为下一个评论的 object 是什么,曾几何时,在您意识到这是多次计算完全相同的pmf之前,有一个product而不是那个zip

Cross product叉积

That usage of product has probably been then reduced to the task of computing phs[i]×pas[j] for all i,j . product 的使用可能已经简化为计算所有i,jphs[i]×pas[j]的任务。 That is a good usage of product .这是product的一个很好的用法。

But, since you have 2 arrays, and you intend to build a numpy array from those phs[i]×pas[j] , let numpy do the job.但是,由于您有 2 个 arrays,并且您打算从这些phs[i]×pas[j]构建一个 numpy 数组,所以让 numpy 来完成这项工作。 It will be more efficient at it.它会更有效率。

prod_table = np.array(phs).reshape(-1,1)*np.array(pas)

Getting arrays directly from Poisson直接从泊松获取 arrays

Which leads to another optimization.这导致了另一个优化。 If the goal is to transform phs and pha into arrays, so that we can mutiply them (one as a line, another as a column) to get the table, why not let numpy build that array directly.如果目标是将phspha转换为 arrays,以便我们可以将它们相乘(一个作为行,另一个作为列)得到表格,为什么不让 numpy 直接构建该数组。 As many numpy function, pmf can have k being a list rather than a scalar, and then returns a list rather than a scalar.和 numpy function 一样, pmf可以让 k 是一个列表而不是标量,然后返回一个列表而不是标量。

So所以

phs=poisson.pmf(mu=xgh*(90-minute)/90, k=range(6), loc=hg)
pas=poisson.pmf(mu=xga*(90-minute)/90, k=range(6), loc=ag)

So, altogether所以,一共

prod_table=poisson.pmf(mu=xgh*(90-minute)/90, k=range(6), loc=hg).reshape(-1,1)*poisson.pmf(mu=xga*(90-minute)/90, k=range(6), loc=ag)

Timings时序

Optimisations优化 Time in μs时间(微秒)
Without没有 1647 μs 1647 微秒
With 329 μs 329微秒

So, it is not just most compact and readable.因此,它不仅是最紧凑和可读的。 It is also (almost exactly) 5 times faster.它也(几乎正好)快了 5 倍。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM