简体   繁体   English

如何找到矩阵中节点的度中心性?

[英]how to find degree centrality of nodes in a matrix?

How does one find the degree centrality of nodes in table like, 如何找到表中节点的度中心,例如,

article   users
         u1  u2  u3  u4 u5 u6 u7
 1        1   1   1   0  0  0  0
 2        0   1   0   1  1  0  0
 3        1   0   0   1  0  1  1

This is just an example of my data I have a very large file consisting of 1533 articles and about 52000 users. 这只是我的数据的一个示例,我有一个非常大的文件,包含1533个文章和大约52000个用户。

I want to find the centrality of articles and centrality of users in the matrix. 我想在矩阵中找到文章的中心和用户的中心。

Degree centrality simply counts the number of other nodes that each node is "connected" to. 中心度仅计算每个节点“连接”到的其他节点的数量。 So to do this for users, for example, we have to define what it means to be connected to another user. 因此,例如,要为用户执行此操作,我们必须定义连接到另一个用户的含义。 The simplest approach asserts a connection if a user has at least one article in common with another user. 如果用户与另一位用户有至少一篇共同的文章,则最简单的方法是声明连接。 A slightly more complex (and probably better) approach weights connectivity by the number of articles in common. 稍微复杂一点(可能更好的一种)的方法会根据共有文章的数量来加权连接性。 So if user 1 has 10 articles in common with user 2 and 3 articles in common with user 3, we say that user 1 is "more connected" to user 2 than to user 3. In what follows, I'll use the latter approach. 因此,如果用户1与用户2有10篇共同的文章,与用户3有3篇共同的文章,我们说用户1与用户2的联系比与用户3的联系“更多”。在下文中,我将使用后一种方法。

This code creates a sample matrix with 15 articles and 30 users, sparsely connected. 此代码创建了一个稀疏连接的样本矩阵,其中包含15条文章和30个用户。 It then calculates a 30 X 30 adjacency matrix for users where the [i,j] element is the number of articles user i has in common with user j. 然后,它为用户计算一个30 X 30的邻接矩阵其中[i,j]元素是用户i与用户j共同拥有的商品数。 Then we create a weighted igraph object from this matrix, and let igraph calculate the degree centrality. 然后,我们从该矩阵创建一个加权igraph对象,然后让igraph度。

Since degree centrality does not take the weights into account, we also calculate eigenvector centrality (which does take the weights into account). 由于度中心度不考虑权重,因此我们也计算特征向量中心度(它确实考虑了权重)。 In this very simple example, the differences are subtle but instructive. 在这个非常简单的示例中,差异是微妙的但具有启发性。

# this just set up the sample - you have the matrix M already
n.articles <- 15
n.users    <- 30
set.seed(1)    # for reproducibility
M <- matrix(sample(0L:1L,n.articles*n.users,p=c(0.8,0.2),replace=T),nc=n.users)

# you start here...
m.adj <- matrix(0L,nc=n.users,nr=n.users)
for (i in 1:(n.users-1)) {
  for (j in (i+1):n.users) {
    m.adj[i,j] <- sum(M[,i]*M[,j])
  }
}
library(igraph)
g <- graph.adjacency(m.adj,weighted=T, mode="undirected")
palette <- c("purple","blue","green","yellow","orange","red")
par(mfrow=c(1,2))
# degree centrality
c.d   <- degree(g)
col <- as.integer(5*(c.d-min(c.d))/diff(range(c.d))+1)
set.seed(1)
plot(g,vertex.color=palette[col],main="Degree Centrality",
     layout=layout.fruchterman.reingold)

# eigenvalue centrality
c.e   <- evcent(g)$vector
col <- as.integer(5*(c.e-min(c.e))/diff(range(c.e))+1)
set.seed(1)
plot(g,vertex.color=palette[col],main="Eigenvalue Centrality",
     layout=layout.fruchterman.reingold)

So in both cases node 15 has the highest centrality. 因此,在两种情况下,节点15的中心度最高。 However, node 28 has a higher degree centrality and a lower eigenvalue centrality than node 27. This is because node 28 is connected to more nodes, but the strength of the connections is lower. 但是,与节点27相比,节点28具有较高的度中心度和较低的特征值中心度。这是因为节点28连接到更多节点,但是连接强度较低。

The same approach can of course be used to calculate article centrality; 当然,可以使用相同的方法来计算商品的中心度。 just use the transpose of M. 只需使用M的转置。

This approach will not work with 52,000 users - the adjacency matrix will contain > 2.5 billion elements. 这种方法不适用于52,000个用户-邻接矩阵将包含> 25亿个元素。 I'm not aware of a workaround for this - perhaps someone else is, I'd like to hear it. 我不知道有什么解决方法-也许有人是,我想听听。 So if you need to tablulate a centrality score for each of the 52,000 users, I can't help you. 因此,如果您需要为52,000个用户中的每个用户汇总中心得分,我将无济于事。 On the other hand if you want to see patterns, it might be possible to carry out the analysis on a random sample of users (say, 10%). 另一方面,如果您想查看模式,则可以对随机的用户样本(例如10%)进行分析。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM