简体   繁体   English

如何在 R 中具有断开组件的网络上计算接近度中心性度量?

[英]How to compute closeness centrality measure on a network with disconnected components in R?

I want to compute closeness centrality measure on a network with disconnected components.我想在具有断开连接的组件的网络上计算接近度中心性度量。 closeness function in igraph does not give meaningful results on such graphs. igraph中的closeness函数在此类图上没有给出有意义的结果。 ( see )

Then I came accross this site where it is explained that closeness can be measured on graphs with disconnected components as well.然后我来到了这个站点,它解释说也可以在具有断开连接的组件的图形上测量接近度。

The following code is what is suggested to achieve this:建议使用以下代码来实现此目的:

# Load tnet
library(tnet)
 
# Load network 
# Node K is assigned node id 8 instead of 10 as isolates at the end of id sequences are not recorded in edgelists
net <- cbind(
  i=c(1,1,2,2,2,3,3,3,4,4,4,5,5,6,6,7,9,10,10,11),
  j=c(2,3,1,3,5,1,2,4,3,6,7,2,6,4,5,4,10,9,11,10),
  w=c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1))
 
# Calculate measures
closeness_w(net, gconly=FALSE)

In my case, I have a transaction data, so the network I build on this data is directed and weighted .在我的例子中,我有一个交易数据,所以我建立在这个数据上的网络是有directedweighted Weights consist of 1/(transaction amount) .权重由1/(transaction amount)

This is my example data:这是我的示例数据:

structure(list(id = c(2557L, 1602L, 18669L, 35900L, 48667L, 51341L
), from = c("5370", "6390", "5370", "5370", "8934", "5370"), 
    to = c("5636", "5370", "8933", "8483", "5370", "7626"), date = structure(c(13099, 
    13113, 13117, 13179, 13238, 13249), class = "Date"), amount = c(2921, 
    8000, 169.2, 71.5, 14.6, 4214)), row.names = c(NA, -6L), class = "data.frame")

I use the following code to achieve what I want:我使用以下代码来实现我想要的:

df2 <- select(df,c(from,to,amount)) %>% 
    group_by(from,to) %>% mutate(weights=1/sum(amount)) %>% select(-amount) %>% distinct
  
  network <- cbind(df2$from,df2$to,df2$weights)

  cl <- closeness_w(network, directed = T, gconly=FALSE)  # here it gives the error: "Error in net[, "w"]^alpha : non-numeric argument to binary operator"

  # so I modify from and to columns as follows to solve the error mentioned above
  df2$from <- as.integer(df2$from)
  df2$to <- as.integer(df2$to)
  # then I run the code again
  network <- cbind(df2$from,df2$to,df2$weights)
  cl <- closeness_w(network, directed = T, gconly=FALSE)

However the output is not like the one on the website that is only consisting closeness scores for each node, instead it created so many rows with 0 value, I dont know why.然而,输出不像网站上的那样只包含每个节点的接近度分数,而是创建了这么多值为 0 的行,我不知道为什么。

The output I got is as follows:我得到的输出如下:

     node  closeness    n.closeness
   [1,]    1 0.00000000 0.000000000000
   [2,]    2 0.00000000 0.000000000000
   [3,]    3 0.00000000 0.000000000000
   [4,]    4 0.00000000 0.000000000000
   [5,]    5 0.00000000 0.000000000000
   ...........................................................
 [330,]  330 0.00000000 0.000000000000
 [331,]  331 0.00000000 0.000000000000
 [332,]  332 0.00000000 0.000000000000
 [333,]  333 0.00000000 0.000000000000
 [ reached getOption("max.print") -- omitted 8600 rows ]

Also, inputs in i and j columns in the data given on the website are reciprocal that is 1->2 exists iff 2->1 exists.此外,网站上给出的数据中ij列中的输入是倒数的,即 1->2 存在且仅当 2->1 存在时。 But my data is not like that, so in my data 5370 sent money to 5636 , but 5636 haven't sent any money to 5370 .但是我的数据不是这样的,所以在我的数据中53705636汇款,但5636没有向5370汇款。 So, how can I compute closeness measure correctly on such directed network of transaction data.那么,如何在这种有向交易数据网络上正确计算接近度度量。 Is there anyone that tried a similar computation before?有没有人以前尝试过类似的计算?

EDIT: Since the weights are not considered as distance in closeness_w function, but rather they are considered as strength, I should have determined weights as sum(amount) instead of 1/sum(amount)编辑:由于在closeness_w函数中权重不被视为距离,而是被视为强度,我应该将weights确定为sum(amount)而不是1/sum(amount)

The reason you get many rows with zero values is because it provides a closeness value for nodes 1 to 8934 (max value in your matrix).您获得许多具有零值的行的原因是因为它为节点 1 到 8934(矩阵中的最大值)提供了接近度值。 If you filter for the values in your dataframe you'll find the values you're looking for:如果您过滤数据框中的值,您将找到您要查找的值:

cl <- closeness_w(df2, directed = T, gconly=FALSE)
cl[cl[, "node"] %in% c(df2$from), ]

     node  closeness  n.closeness
[1,] 5370 1.37893704 1.543644e-04
[2,] 6390 0.03668555 4.106745e-06
[3,] 8934 5.80008056 6.492870e-04

The direction has been accounted for, if you filter for the 'to' nodes you'll see only 5370 has a value:已经考虑了方向,如果您过滤“to”节点,您将看到只有 5370 有一个值:

cl[cl[, "node"] %in% c(df2$to), ]

     node closeness  n.closeness
[1,] 5370  1.378937 0.0001543644
[2,] 5636  0.000000 0.0000000000
[3,] 7626  0.000000 0.0000000000
[4,] 8483  0.000000 0.0000000000
[5,] 8933  0.000000 0.0000000000

If you go back to the example you're following, if you remove nodes from the middle of the data you'll see that it gives zeros for missing nodes, and try setting directed = F and you'll notice the difference.如果你回到你所遵循的例子,如果你从数据的中间删除节点,你会看到它为缺失的节点提供零,并尝试设置directed = F ,你会注意到差异。

Update:更新:

If you want an alternative to creating your network, after you create df2 you can just pass that into the closeness_w function and your node labels will become indices and the node column gets reduced to 1:n:如果您想要创建网络的替代方法,在您创建 df2 之后,您可以将其传递给 closeness_w 函数,您的节点标签将成为索引,并且节点列将减少到 1:n:

df2 <- df %>% 
  group_by(from, to) %>% 
  mutate(weights = 1/sum(amount)) %>% 
  select(from, to, weights) %>% 
  distinct

cl <- closeness_w(df2, directed = T, gconly=FALSE)
cl 

     node  closeness n.closeness
5370    1 1.37893704 0.229822840
5636    2 0.00000000 0.000000000
7626    3 0.00000000 0.000000000
8483    4 0.00000000 0.000000000
8933    5 0.00000000 0.000000000
6390    6 0.03668555 0.006114259
8934    7 5.80008056 0.966680093

The webpage you quote does not explain that "closeness can be applied to disconnected networks".您引用的网页没有解释“紧密度可以应用于断开连接的网络”。 Instead, it proposes computing an entirely different quantity than closeness.相反,它建议计算与接近度完全不同的数量。

What they compute is in fact known as global efficiency, and was proposed in this paper:他们计算的实际上称为全局效率,并在本文中提出:

You will find implementations in some packages.你会在一些包中找到实现。 I have implemented this for igraph as well, and it will be included in version 0.9 of C/igraph (presumably also in some version of R/igraph).我也为 igraph 实现了这一点,它将包含在 C/igraph 的 0.9 版中(大概也在某些版本的 R/igraph 中)。 It is already accessible from IGraph/M , which serves as igraph's Mathematica interface.它已经可以从 IGraph/M访问,它作为 igraph 的 Mathematica 接口。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM