[英]Why is 'curve' so different from 'lines' and 'points' in R?
I would like to fit the frequency data with discrete generalized beta distribution ( DGBD ). 我想用频率数据拟合离散广义β分布( DGBD )。
The data look like this: 数据如下所示:
freq = c(1116, 2067, 137 , 124, 643, 2042, 55 ,47186, 7504, 1488, 211, 1608,
3517 , 7 , 896 , 378, 17 ,3098, 164977 , 601 , 196, 637, 149 , 44,2 , 1801, 882 , 636,5184, 1851, 776 , 343 , 851, 33 ,4011, 209, 715 ,
937 , 20, 6922, 2028 , 23, 3045 , 16 , 334, 31 , 2)
Rank = rank(-freq, ties.method = c("first") )
p = freq/sum(freq)
log.f = log(freq)
log.p = log(p)
log.rank = log(Rank)
log.inverse.rank = log(length(Rank)+1-Rank)
co=coef(lm(log.p~log.inverse.rank + log.rank))
zmf = function(x) exp(co[[1]]+ co[[2]]*log(length(x)+1-x) + co[[3]]*log(x))
plot(p~Rank, xlim = c(1, 80), log = "xy",xlab = "Rank (log)", ylab = "Probability (log)")
curve(zmf, col="blue", add = T)
xx=c(1:length(Rank))
lines(zmf(xx)~xx, col = "red")
points(zmf(xx)~xx, col = "purple")
Although I have not figured out the underling logic, the solution is found: 虽然我没有弄清楚底层逻辑,但找到了解决方案:
@Frank reminds me to notice the trick of setting the length of n in the curve. @Frank提醒我注意在曲线中设置n长度的技巧。 It solves the problem.
它解决了这个问题。 Thus, n in curve is necessary when we try to fit the raw data.
因此,当我们尝试拟合原始数据时,曲线中的n是必要的。 Although in many situations, n is ignored.
虽然在许多情况下,n被忽略。
plot(p~Rank, log = "xy",xlab = "Rank (log)", ylab = "Probability (log)")
curve(zmf, col="blue", add = T, n = length(Rank)) # set the the number of x values at which to evaluate.
The reason you need to specify the n
here is because your function depends on length(x)
! 你需要在这里指定
n
的原因是因为你的函数取决于length(x)
!
zmf = function(x) exp(co[[1]]+ co[[2]]*log(length(x)+1-x) + co[[3]]*log(x))
^^^^^^^^^
Here the length of the x
's provided to your function by curve
is n
! 这里通过
curve
提供给函数的x
的长度是n
!
Here is your plot if you stick with the default n=101
but feed your line
and points
with a vector xx
of length 101: 如果你坚持使用默认的
n=101
那么这是你的情节,但用长度为101的向量xx
喂你的line
和points
:
plot(p~Rank, xlim = c(1,80), log = "xy",xlab = "Rank (log)", ylab = "Probability (log)")
curve(zmf, col="blue", add = T)
xx=seq(1,length(Rank),length.out=101)
lines(zmf(xx)~xx, col = "red")
points(zmf(xx)~xx, col = "purple")
Neither voodoo nor bug ! 无论是伏都教还是虫子! :)
:)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.