简体   繁体   English

从Java中的数据集中查找泊松分布

[英]Finding poisson distribution from data set in Java

I have large data set in excel. 我在excel中有大量数据。 I want to find out whether the numbers follow Poisson Distribution or Binomial distribution in Java. 我想找出数字在Java中是遵循泊松分布还是二项分布 Is there any open source library that would help me to get this done. 是否有任何开源库可以帮助我完成此任务。 I'm looking at Apache Common Math. 我正在看Apache Common Math。

Any pointers would help? 任何指针会有所帮助吗?

It sounds like you have a (relatively simple) model fitting problem, and you are trying to choose between two distributions. 听起来您有一个(相对简单的)模型拟合问题,并且您正在尝试在两个分布之间进行选择。 The way that you would usually do this is as follows. 您通常会执行以下操作。

  1. Estimate parameters p_poisson for the Poisson distribution on your data 估计数据中泊松分布的参数 p_poisson
  2. Estimate parameters p_binomial for the binomial distribution on your data. 为数据的二项式分布估计参数 p_binomial
  3. Compute p(data | p_poisson) and p(data | p_binomial) (the likelihood function) and choose the one that has higher probability. 计算p(data | p_poisson)p(data | p_binomial) (似然函数),然后选择概率更高的那个。

For more generality, I would recommend looking at AIC , BIC , and general information on model selection . 要获得更多通用性,我建议您查看AICBIC和有关模型选择的一般信息。 In this case, if you don't have a ton of data, the binomial distribution should be penalized slightly for the possibility of overfitting, because it has more parameters than the Poisson. 在这种情况下,如果您没有大量数据,则二项式分布应稍作惩罚,以免过度拟合,因为二项分布比Poisson具有更多的参数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM