简体   繁体   English

在R中的定向网络中查找匹配项

[英]Finding Matches in a Directed Network in R

I am looking at a directed data set using iGraph based on people who follow one another on Twitter. 我正在使用基于iGraph的定向数据集,该数据集基于在Twitter上彼此关注的人。 I have a data table 我有一个数据表

follower_user_id | followed_user_id | gender_of_follower | gender_of_followed
               1 |                2 |                 F  |                  M 
               2 |                3 |                 M  |                  M
               3 |                2 |                 M  |                  M 

and so on.. 等等..

I would like to assess the matches, so in which situations do the users mutually follow one another so that I can further look at who was not followed back by anyone and whether men for example are more likely to be followed back than women (etc. ). 我想评估一下比赛,因此在哪些情况下用户会互相关注,以便我进一步查看谁没有被谁追随,以及例如男人是否比女人更容易被追随(等等。 )。 But I am not sure how to filter down for the matches in the first place. 但是我不确定如何首先过滤掉比赛。

So far, I think the best way to do this is by using a matrix of all user ids against all user ids and counting the time each pair appears together. 到目前为止,我认为做到这一点的最佳方法是使用所有用户ID相对于所有用户ID的矩阵,并计算每对出现在一起的时间。

M <- table (df$follower_user_id, df$followed_user_id)
follower.followed.matrix <- M %*% t(M)


XXXXX 1    2   3 
1     0    1   0
2     0    0   1
3     0    1   0 

But I am unsure of how to merge ones where converses are combined (eg where 2-3 pairing = 2) Is it possible to use the 'reshape2' package to melt for a directed network? 但是我不确定如何在合并了逆向词的情况下合并(例如2-3对= 2的地方),是否可以使用'reshape2'包来融合定向网络?

I am thinking the best way to do it is in this way and then merging gender data into a new data.table of matches but I am open to suggestions for how to filter for this data in a more efficient way. 我认为最好的方法是通过这种方式,然后将性别数据合并到新的match.table中,但是我对如何更有效地过滤此数据的建议持开放态度。 I am still new to R so any help is appreciated. 我对R还是陌生的,因此可以提供任何帮助。

Getting the matrix that you need - the adjacency matrix - is built into igraph. igraph内置了获取所需矩阵(邻接矩阵)的功能。 I will use a slightly bigger example than yours to make sure that the solution handles all cases. 我将使用比您大一些的示例,以确保该解决方案能够处理所有情况。

Data: Enlarged follower Network 数据:扩大的关注者网络

FOL = read.table(text="follower_user_id followed_user_id gender_of_follower gender_of_followed
  1  2  F  M 
  2  3  M  M
  3  2  M  M 
  4  2  M  M
  2  4  M  M
  2  5  M  F
  4  5  M  F",
header=TRUE)

## turn it into a graph and compute adjaceny
g = graph_from_edgelist(as.matrix(FOL[,1:2]))
AM = as.matrix(as_adjacency_matrix(g))

Now with the adjacency matrix you can quickly compute the pairs for which A follows B and B follows A. 现在,通过邻接矩阵,您可以快速计算A跟随B和B跟随A的对。

sapply(1:5, function(x) { AM[x,] * AM[,x] })
     [,1] [,2] [,3] [,4] [,5]
[1,]    0    0    0    0    0
[2,]    0    0    1    1    0
[3,]    0    1    0    0    0
[4,]    0    1    0    0    0
[5,]    0    0    0    0    0

You can see that the needed pairs are (2,3), (2,4), (3,2) and (4,2). 您可以看到所需的对是(2,3),(2,4),(3,2)和(4,2)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM