简体   繁体   English

R中的子图文本分析(igraph)

[英]subgraph text analysis in R (igraph)

I am curious how to access additional attributes for a graph which are associated with the edges. 我很好奇如何访问与边相关的图形的其他属性。 To follow along here is a minimal example: 以下是一个最小的示例:

library("igraph")
library("SocialMediaLab")

myapikey =''
myapisecret =''
myaccesstoken = ''
myaccesstokensecret = ''

tweets <- Authenticate("twitter",
                       apiKey = myapikey,
                       apiSecret = myapisecret,
                       accessToken = myaccesstoken,
                       accessTokenSecret = myaccesstokensecret) %>%
Collect(searchTerm="#trump", numTweets = 100,writeToFile=FALSE,verbose=TRUE)
g_twitter_actor <- tweets %>% Create("Actor", writeToFile=FALSE)
c <- igraph::components(g_twitter_actor, mode = 'weak')
subCluster <- induced.subgraph(g_twitter_actor, V(g_twitter_actor)[which(c$membership == which.max(c$csize))])

The initial tweets contains the following columns 初始推文包含以下各列

colnames(tweets)
 [1] "text"            "favorited"       "favoriteCount"   "replyToSN"       "created_at"      "truncated"       "replyToSID"      "id"             
 [9] "replyToUID"      "statusSource"    "screen_name"     "retweetCount"    "isRetweet"       "retweeted"       "longitude"       "latitude"       
[17] "from_user"       "reply_to"        "users_mentioned" "retweet_from"    "hashtags_used"

How can I access the text property for the subgraph in order to perform text analysis? 如何访问子图的text属性以执行文本分析? E(subCluster)$text does not work E(subCluster)$text不起作用

E(subCluster)$text does not work because the values for tweets$text are not added to the graph when it is made. E(subCluster)$text不起作用,因为tweets$text的值在创建时未添加到图形中。 So you have to do that manually. 因此,您必须手动执行此操作。 It's a bit of a pain, but doable. 有点痛苦,但可行。 Requires some subsetting of the tweets data frame and matching based on user names. 需要tweets数据框的某些子集,并需要根据用户名进行匹配。

First, notice that the edge types are in a particular order: retweets, mentions, replies. 首先,请注意边缘类型按特定顺序排列:转发,提及,回复。 The same text from a particular user can apply to all three of these. 来自特定用户的相同文本可以应用于所有这三个。 So I think it makes sense to add text serially. 因此,我认为串行添加文本是有意义的。

> unique(E(g_twitter_actor)$edgeType)
[1] "Retweet" "Mention" "Reply"  

Using dplry and reshape2 makes this easier. 使用dplryreshape2使其更容易。

library(reshape2); library(dplyr)
#Make data frame for retweets, mentions, replies
rts <- tweets %>% filter(!is.na(retweet_from))
ms <- tweets %>% filter(users_mentioned!="character(0)")
rpls <- tweets %>% filter(!is.na(reply_to))

Since users_mentioned can contain a list of individuals, we have to unlist it. 由于users_mentioned可以包含个人列表,因此我们必须取消列出。 But we want to associate the users mentioned with the user who mentioned them. 但是我们想将提到的用户与提到他们的用户相关联。

#Name each element in the users_mentioned list after the user who mentioned
names(ms$users_mentioned) <- ms$screen_name
ms <- melt(ms$users_mentioned) #melting creates a data frame for each user and the users they mention

#Add the text
ms$text <- tweets[match(ms$L1,tweets$screen_name),1]

Now add each of these to the network as an edge attribute by matching the edge type. 现在,通过匹配边缘类型,将其中每个作为边缘属性添加到网络。

E(g_twitter_actor)$text[E(g_twitter_actor)$edgeType %in% "Retweet"] <- rts$text
E(g_twitter_actor)$text[E(g_twitter_actor)$edgeType %in% "Mention"] <- ms$text
E(g_twitter_actor)$text[E(g_twitter_actor)$edgeType %in% "Reply"] <- rpls$text

Now you can subset and get the edge value for text. 现在,您可以子集化并获取文本的边值。

subCluster <- induced.subgraph(g_twitter_actor, 
                           V(g_twitter_actor)[which(c$membership == which.max(c$csize))])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM