简体   繁体   English

如何计算 R 中的 zipf 指数?

[英]How to calculate zipf exponent in R?

The generalized Zipf's law states that, if we rank a collection of n objects in non-decreasing order according to their size, the product of a power of the rank and of the size of each object is constant throughout the collection, ie广义 Zipf 定律指出,如果我们根据 n 个对象的大小以非递减的顺序对 n 个对象的集合进行排序,则秩次方与每个对象大小的乘积在整个集合中是恒定的,即

在此处输入图像描述

where r is the rank, zr is the size of the rth object, and alpha is Zipf's parameter.其中 r 是秩,zr 是第 r 个对象的大小,alpha 是 Zipf 的参数。

I would like to calculate the exponent of the function which shows zipf's law in data, ie the Zipf's parameter/exponent.我想计算在数据中显示 zipf 定律的函数的指数,即 Zipf 的参数/指数。 My data are the following:我的数据如下:

> dput(df)
structure(list(x = c(1.06936486607035, 1.3232662468642, 1.57716762765805, 
1.83106900845189, 2.08497038924574, 2.33887177003959, 2.59277315083344, 
2.84667453162729, 3.10057591242114, 3.35447729321498, 3.60837867400883, 
3.86228005480268, 4.11618143559653, 4.37008281639038, 4.62398419718422, 
4.87788557797807, 5.13178695877192, 5.38568833956577, 5.63958972035962, 
5.89349110115347, 6.14739248194731, 6.40129386274116, 6.65519524353501, 
6.90909662432886, 7.16299800512271, 7.41689938591655, 7.6708007667104, 
7.92470214750425, 8.1786035282981, 8.43250490909195, 8.6864062898858, 
8.94030767067964, 9.19420905147349, 9.44811043226734, 9.70201181306119, 
9.95591319385504, 10.2098145746489, 10.4637159554427, 10.7176173362366, 
10.9715187170304, 11.2254200978243, 11.4793214786181, 11.733222859412, 
11.9871242402058, 12.2410256209997, 12.4949270017935, 12.7488283825874, 
13.0027297633812, 13.2566311441751, 13.5105325249689, 13.7644339057628, 
14.0183352865566, 14.2722366673505, 14.5261380481443, 14.7800394289382, 
15.033940809732, 15.2878421905258, 15.5417435713197, 15.7956449521135, 
16.0495463329074, 16.3034477137012, 16.5573490944951, 16.8112504752889, 
17.0651518560828, 17.3190532368766, 17.5729546176705, 17.8268559984643, 
18.0807573792582, 18.334658760052, 18.5885601408459, 18.8424615216397, 
19.0963629024336, 19.3502642832274, 19.6041656640213, 19.8580670448151, 
20.111968425609, 20.3658698064028, 20.6197711871967, 20.8736725679905, 
21.1275739487844, 21.3814753295782, 21.6353767103721, 21.8892780911659, 
22.1431794719598, 22.3970808527536, 22.6509822335474, 22.9048836143413, 
23.1587849951351, 23.412686375929, 23.6665877567228, 23.9204891375167, 
24.1743905183105, 24.4282918991044, 24.6821932798982, 24.9360946606921
), y = c(-2.97228886692625, -2.95440976170107, -2.93928459279152, 
-2.92685672250007, -2.91707897563357, -2.91054871731668, -2.90679861996743, 
-2.90554785065139, -2.90675006309313, -2.91036572966993, -2.91696470816554, 
-2.92597057051316, -2.93718053039632, -2.95054999876795, -2.96603736909913, 
-2.98406085693689, -3.00405379487858, -3.02588740495999, -3.04950848046858, 
-3.07486235427239, -3.10210692287855, -3.13082120061712, -3.16091945841148, 
-3.19233074728207, -3.2249788128355, -3.25869455640463, -3.29332682179158, 
-3.32879100108009, -3.36499680219032, -3.40182490231023, -3.43885206667123, 
-3.47620809318544, -3.51379996912, -3.55153068719991, -3.58922700390204, 
-3.62638735300239, -3.66333893302836, -3.70000206447245, -3.73629766644354, 
-3.77202286263472, -3.80675861286812, -3.84092561898948, -3.87447795824782, 
-3.90737403696004, -3.93941841852314, -3.97035436941129, -4.00059707307105, 
-4.03013619484928, -4.05896597861451, -4.0869186255246, -4.11388702659758, 
-4.14021632809744, -4.16591776523316, -4.19100561781447, -4.21534097913824, 
-4.23891623603497, -4.26199110836985, -4.2845881782092, -4.30673264230625, 
-4.32831990641116, -4.34940948783001, -4.37019317123469, -4.39070825537989, 
-4.41099561113224, -4.43100956980122, -4.45086204575797, -4.47069657988101, 
-4.49057219155847, -4.51055207455314, -4.53067990556232, -4.55108865901739, 
-4.57188026595389, -4.59313206537618, -4.61492481492843, -4.63739606090163, 
-4.6606424296565, -4.68472597512036, -4.70972674413206, -4.73572647040504, 
-4.76290834240927, -4.79128541380379, -4.820888083087, -4.85177836410324, 
-4.88401718152641, -4.91772243314579, -4.95283585285162, -4.98936501364757, 
-5.02733146115584, -5.0667505747377, -5.10751161205962, -5.14958042252788, 
-5.19293189917589, -5.23752155483956, -5.28329086087404, -5.3297251199846
)), row.names = c(NA, -95L), class = c("tbl_df", "tbl", "data.frame"
))

They result from the kernel density estimation of degree distibution of a network (ie in the x axis we have the degree and in the y axis the logarithm of number of nodes with that degree).它们来自网络度分布的核密度估计(即在 x 轴上我们有度数,在 y 轴上是具有该度数的节点数的对数)。

How can I estimate the Zipf's exponent from this dataset?如何从这个数据集中估计 Zipf 的指数?

You can check out the gamlss package which provides functions to fit Zipf distribution (and other varieties of it).您可以查看gamlss包,它提供了适合 Zipf 发行版(以及它的其他变体)的功能。

https://cran.r-project.org/web/packages/gamlss/gamlss.pdf (pg. 47) https://cran.r-project.org/web/packages/gamlss/gamlss.pdf (第 47 页)

# install.packages('gamlss')
library(gamlss)

gamlss(
   formula = ...,
   data = ...,
   family = ZIPF(mu.link = 'log')
)

But I don't really understand your data.但我不太了解你的数据。 You said the x axis is the degree so how come they are not integers?你说x轴是度数,为什么它们不是整数? Is your network a valued network?您的网络是有价值的网络吗?

Also you said the y axis is the logarithm of number of nodes .您还说y轴是节点数的对数 But that would imply the number of nodes are numbers between 0 and 1 which doesn't really make sense.但这意味着节点的数量是 0 到 1 之间的数字,这实际上没有意义。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM