为什么'曲线'与R中的'线'和'点'如此不同？

Question

I would like to fit the frequency data with discrete generalized beta distribution ( DGBD ). 我想用频率数据拟合离散广义β分布（ DGBD ）。

The data look like this: 数据如下所示：

freq = c(1116, 2067, 137 ,  124, 643,  2042, 55  ,47186,  7504, 1488, 211,   1608,   
         3517 , 7  , 896  ,  378, 17 ,3098, 164977  ,  601 ,  196, 637, 149 , 44,2 ,  1801, 882   , 636,5184,  1851,  776 ,   343   , 851, 33  ,4011,   209,  715 , 
         937 , 20,   6922, 2028 , 23,  3045 , 16 , 334,  31 ,  2)

Rank = rank(-freq, ties.method = c("first") )
p = freq/sum(freq)

get the log forms 获取日志表单

log.f = log(freq)
log.p = log(p)
log.rank = log(Rank)
log.inverse.rank = log(length(Rank)+1-Rank)

linear regression of the discrete generalized beta distribution 离散广义β分布的线性回归

co=coef(lm(log.p~log.inverse.rank + log.rank))
zmf = function(x) exp(co[[1]]+ co[[2]]*log(length(x)+1-x) + co[[3]]*log(x))

plot 情节

plot(p~Rank, xlim = c(1, 80), log = "xy",xlab = "Rank (log)", ylab = "Probability (log)")
curve(zmf, col="blue", add = T)
xx=c(1:length(Rank))
lines(zmf(xx)~xx, col = "red")
points(zmf(xx)~xx, col = "purple")

在此输入图像描述

Figure 1. the plot looks like this 图1.情节看起来像这样

My question is what is the right way to demonstrate the result? 我的问题是证明结果的正确方法是什么？ lines (points) or curve? 线（点）或曲线？

Update: 更新：

Although I have not figured out the underling logic, the solution is found: 虽然我没有弄清楚底层逻辑，但找到了解决方案：

@Frank reminds me to notice the trick of setting the length of n in the curve. @Frank提醒我注意在曲线中设置n长度的技巧。 It solves the problem. 它解决了这个问题。 Thus, n in curve is necessary when we try to fit the raw data. 因此，当我们尝试拟合原始数据时，曲线中的n是必要的。 Although in many situations, n is ignored. 虽然在许多情况下，n被忽略。

plot(p~Rank, log = "xy",xlab = "Rank (log)", ylab = "Probability (log)")
curve(zmf, col="blue", add = T, n = length(Rank)) # set the the number of x values at which to evaluate.

在此输入图像描述

Figure 2 The right way to use curve: specify the 'n' 图2使用曲线的正确方法：指定'n'

Answer 1

The reason you need to specify the n here is because your function depends on length(x) ! 你需要在这里指定n的原因是因为你的函数取决于length(x) ！

zmf = function(x) exp(co[[1]]+ co[[2]]*log(length(x)+1-x) + co[[3]]*log(x))
                                           ^^^^^^^^^

Here the length of the x 's provided to your function by curve is n ! 这里通过curve提供给函数的x的长度是n ！

Here is your plot if you stick with the default n=101 but feed your line and points with a vector xx of length 101: 如果你坚持使用默认的n=101那么这是你的情节，但用长度为101的向量xx喂你的line和points ：

plot(p~Rank, xlim = c(1,80), log = "xy",xlab = "Rank (log)", ylab = "Probability (log)")
curve(zmf, col="blue", add = T)
xx=seq(1,length(Rank),length.out=101)
lines(zmf(xx)~xx, col = "red")
points(zmf(xx)~xx, col = "purple")

在此输入图像描述

Neither voodoo nor bug ! 无论是伏都教还是虫子！ :) :)

为什么'曲线'与R中的'线'和'点'如此不同？

问题描述

get the log forms 获取日志表单

linear regression of the discrete generalized beta distribution 离散广义β分布的线性回归

plot 情节

Figure 1. the plot looks like this 图1.情节看起来像这样

My question is what is the right way to demonstrate the result? 我的问题是证明结果的正确方法是什么？ lines (points) or curve? 线（点）或曲线？

Update: 更新：

Figure 2 The right way to use curve: specify the 'n' 图2使用曲线的正确方法：指定'n'

1 个解决方案

解决方案1
3 2014-03-28 09:22:58

为什么'曲线'与R中的'线'和'点'如此不同？

问题描述

get the log forms 获取日志表单

linear regression of the discrete generalized beta distribution 离散广义β分布的线性回归

plot 情节

Figure 1. the plot looks like this 图1.情节看起来像这样

My question is what is the right way to demonstrate the result? 我的问题是证明结果的正确方法是什么？ lines (points) or curve? 线（点）或曲线？

Update: 更新：

Figure 2 The right way to use curve: specify the 'n' 图2使用曲线的正确方法：指定'n'

1 个解决方案

解决方案1 3 2014-03-28 09:22:58

解决方案1
3 2014-03-28 09:22:58