簡體   English   中英

r中的時變網絡

[英]Time varying network in r

我有關於大學俱樂部每周社交時間可能發生的事情的數據

我的數據樣本如下

structure(list(from = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", 
"B", "C"), class = "factor"), to = structure(c(2L, 3L, 2L, 3L, 
2L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("A", 
"B", "C"), class = "factor"), timestalked = c(0L, 1L, 0L, 4L, 
1L, 2L, 0L, 1L, 0L, 2L, 1L, 0L, 1L, 2L, 1L, 0L, 0L, 0L), week = structure(c(1L, 
1L, 3L, 3L, 2L, 2L, 1L, 1L, 3L, 3L, 2L, 2L, 1L, 1L, 3L, 3L, 2L, 
2L), .Label = c("1/1/2010", "1/15/2010", "1/8/2010"), class = "factor")), .Names = c("from", 
"to", "timestalked", "week"), class = "data.frame", row.names = c(NA, 
-18L))

我正在嘗試計算網絡統計數據,例如每個星期,過去兩周和年初的ABC 我讓這個工作的唯一方法是手動分解我想要分析的時間單位的文件,但我希望有一個不那么繁瑣的方式。

timestalked為0時,應將其視為無邊緣

輸出將生成.csv其中包含以下內容:

actor  cent_week1 cent_week2 cent_week3 cent_last2weeks cent_yeartodate
 A       
 B
 C 

cent_week1是2010年1月1日的中心地位; cent_last2weeks正在考慮2010年1月8日和2010年1月15日; cent_yeartodate是同時考慮的所有數據。 這被應用於數百萬觀測的更大數據集。

無法評論,所以我正在寫一個“答案”。 如果你想對timestalked執行一些數學運算並通過from獲取值(在你的例子中找不到任何名為actor變量),這里有一個data.table方法,可以提供幫助:

dat <- as.data.table(dat) # or add 'data.table' to the class parameter
dat$week <- as.Date(dat$week, format = "%m/%d/%Y")
dat[, .(cent = mean(timestalked)), by = list(from, weeknum = week(week))]

這給出了以下輸出:

dat [,。(cent = mean(timestalk)),by = list(from,weeknum = week(week))]

   from weeknum cent
1:    A       1  0.5
2:    A       2  2.0
3:    A       3  1.5
4:    B       1  0.5
5:    B       2  1.0
6:    B       3  0.5
7:    C       1  1.5
8:    C       2  0.5
9:    C       3  0.0

將此分配給new_dat 您可以使用new_dat[weeknum %in% 2:3]或者您想要的任何其他變體或一年中的sum來按周new_dat[weeknum %in% 2:3] 此外,您還可以根據需要進行排序/訂購。

希望這可以幫助!

怎么樣:

library(dplyr)
centralities <- tmp       %>% 
  group_by(week)          %>% 
  filter(timestalked > 0) %>% 
  do(
    week_graph=igraph::graph_from_edgelist(as.matrix(cbind(.$from, .$to)))
  )                       %>% 
  do(
    ecs = igraph::eigen_centrality(.$week_graph)$vector
  )                       %>% 
  summarise(ecs_A = ecs[[1]], ecs_B = ecs[[2]], ecs_C = ecs[[3]])

如果你有很多演員,你可以使用summarise_all 將它放在長格式中是一種練習。

可以通過在另一個表中設置窗口,然后在每個窗口上進行組操作來完成此操作:

數據准備:

# Load Data
DT <- structure(list(from = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", 
"B", "C"), class = "factor"), to = structure(c(2L, 3L, 2L, 3L, 
2L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("A", 
"B", "C"), class = "factor"), timestalked = c(0L, 1L, 0L, 4L, 
1L, 2L, 0L, 1L, 0L, 2L, 1L, 0L, 1L, 2L, 1L, 0L, 0L, 0L), week = structure(c(1L, 
1L, 3L, 3L, 2L, 2L, 1L, 1L, 3L, 3L, 2L, 2L, 1L, 1L, 3L, 3L, 2L, 
2L), .Label = c("1/1/2010", "1/15/2010", "1/8/2010"), class = "factor")), .Names = c("from", 
"to", "timestalked", "week"), class = "data.frame", row.names = c(NA, 
-18L))

# Code
library(igraph)
library(data.table)

setDT(DT)

# setup events
DT <- DT[timestalked > 0]
DT[, week := as.Date(week, format = "%m/%d/%Y")]

# setup windows, edit as needed
date_ranges <- data.table(label = c("cent_week_1","cent_week_2","cent_last2weeks","cent_yeartodate"),
                          week_from = as.Date(c("2010-01-01","2010-01-08","2010-01-08","2010-01-01")),
                          week_to = as.Date(c("2010-01-01","2010-01-08","2010-01-15","2010-01-15"))
)

# find all events within windows
DT[, JA := 1]
date_ranges[, JA := 1]
graph_base <- merge(DT, date_ranges, by = "JA", allow.cartesian = TRUE)[week >= week_from & week <= week_to]

現在是按組編碼,第二行有點粗略,對於如何避免雙重調用的想法持開放態度

graph_base <- graph_base[, .(graphs = list(graph_from_data_frame(.SD))), by = label, .SDcols = c("from", "to", "timestalked")] # create graphs
graph_base <- graph_base[, .(vertex = names(eigen_centrality(graphs[[1]])$vector), ec = eigen_centrality(graphs[[1]])$vector), by = label] # calculate centrality

dcast進行最終格式化:

dcast(graph_base, vertex ~ label, value.var = "ec")
   vertex cent_last2weeks cent_week_1 cent_week_2 cent_yeartodate
1:      A       1.0000000   0.7071068   0.8944272       0.9397362
2:      B       0.7052723   0.7071068   0.4472136       0.7134685
3:      C       0.9008487   1.0000000   1.0000000       1.0000000

此分析遵循一般的拆分 - 應用 - 組合方法,其中數據按周拆分,應用圖形函數,然后將結果組合在一起。 有幾種工具,但下面使用base R和data.table

基地R.

首先為您的數據設置數據類,因此該術語持續兩周有意義。

# Set date class and order
d$week <- as.Date(d$week, format="%m/%d/%Y")
d <- d[order(d$week), ]
d <- d[d$timestalked > 0, ] # remove edges // dont need to do this is using weights

然后拆分並應用圖形函數

# split data and form graph for eack week
g1 <- lapply(split(seq(nrow(d)), d$week), function(i) 
                                                  graph_from_data_frame(d[i,]))
# you can then run graph functions to extract specific measures
(grps <- sapply(g1, function(x) eigen_centrality(x,
                                            weights = E(x)$timestalked)$vector))

#   2010-01-01 2010-01-08 2010-01-15
# A  0.5547002  0.9284767  1.0000000
# B  0.8320503  0.3713907  0.7071068
# C  1.0000000  1.0000000  0.7071068

# Aside: If you only have one function to run on the graphs, 
# you could do this in one step
# 
# sapply(split(seq(nrow(d)), d$week), function(i) {
#             x = graph_from_data_frame(d[i,])
#             eigen_centrality(x, weights = E(x)$timestalked)$vector
#           })

然后,您需要在所有數據的分析中進行組合 - 因為您只需要構建另外兩個圖表,這不是耗時的部分。

fun1 <- function(i, name) {
            x = graph_from_data_frame(i)
            d = data.frame(eigen_centrality(x, weights = E(x)$timestalked)$vector)
            setNames(d, name)
    }


a = fun1(d, "alldata")
lt = fun1(d[d$week %in% tail(unique(d$week), 2), ], "lasttwo")

# Combine: could use `cbind` in this example, but perhaps `merge` is 
# safer if there are different levels between dates
data.frame(grps, lt, a) # or
Reduce(merge, lapply(list(grps, a, lt), function(x) data.frame(x, nms = row.names(x))))

#   nms X2010.01.01 X2010.01.08 X2010.01.15  alldata lasttwo
# 1   A   0.5547002   0.9284767   1.0000000 0.909899     1.0
# 2   B   0.8320503   0.3713907   0.7071068 0.607475     0.5
# 3   C   1.0000000   1.0000000   0.7071068 1.000000     1.0

data.table

這個耗時的步驟很可能會明確地拆分 - 在數據上應用函數。 data.table應該在這里提供一些好處,特別是當數據變大和/或有更多組時。

# function to apply to graph
fun <- function(d) {
  x = graph_from_data_frame(d)
  e = eigen_centrality(x, weights = E(x)$timestalked)$vector
  list(e, names(e))
}

library(data.table)
dcast(
  setDT(d)[, fun(.SD), by=week], # apply function - returns data in  long format
  V2 ~ week, value.var = "V1")   # convert to wide format

#    V2 2010-01-01 2010-01-08 2010-01-15
# 1:  A  0.5547002  0.9284767  1.0000000
# 2:  B  0.8320503  0.3713907  0.7071068
# 3:  C  1.0000000  1.0000000  0.7071068

然后像以前一樣,在完整數據/最近兩周內運行該功能。

答案之間存在差異,這取決於我們在weights時如何使用weights參數,而其他人不使用權數。


d=structure(list(from = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", 
"B", "C"), class = "factor"), to = structure(c(2L, 3L, 2L, 3L, 
2L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("A", 
"B", "C"), class = "factor"), timestalked = c(0L, 1L, 0L, 4L, 
1L, 2L, 0L, 1L, 0L, 2L, 1L, 0L, 1L, 2L, 1L, 0L, 0L, 0L), week = structure(c(1L, 
1L, 3L, 3L, 2L, 2L, 1L, 1L, 3L, 3L, 2L, 2L, 1L, 1L, 3L, 3L, 2L, 
2L), .Label = c("1/1/2010", "1/15/2010", "1/8/2010"), class = "factor")), .Names = c("from", 
"to", "timestalked", "week"), class = "data.frame", row.names = c(NA, 
-18L))

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM