简体   繁体   中英

distinguishing between internal vs external ties in R

i encountered a challenging situation:
i have two separate data set as df1 and df2; df1 is HR data and df2 is email communication of a big company
df1 columns are as: ID (which is email address), department, ...
df2 columns are as: sender (email address), receiver (email address); there can be multiple emails between two nodes
ps all isolates are removed, and there is no loop in email communications
i also created a graph object as follows:

g1<- graph.data.frame (df2[1,2], directed= T, vertices= df1)  

now i want to differentiate between internal ties (both nodes are in the same dept) and external ties. i used the following code:

E(g1)$internal= as.numeric ( df1$dept[df2$sender]== df1$dept[def2$receiver])

but the result is all NAs. i know it happens since each part of the following code (df1$dept [df2$sender] OR df1$dept [def2$receiver) also returns just NAs. could you please help me to iron this kink?

You are almost there with your approach, but you need your vector of departments to recognize the indexing by sender. You can do this by naming the elements in your department code vector. Some example code illustrates:

# I generate some fake data to mimic your problem
set.seed(125)
library(igraph)

# fake df1
df1 <- data.frame(ID = 1:9, dept = rep(LETTERS[1:3], 3), stringsAsFactors = F)
df1
#>   ID dept
#> 1  1    A
#> 2  2    B
#> 3  3    C
#> 4  4    A
#> 5  5    B
#> 6  6    C
#> 7  7    A
#> 8  8    B
#> 9  9    C

# fake df2
df2 <- data.frame(sender = sample(1:9, 6, replace = T), receiver = sample(1:9, 6, replace = T))
df2
#>   sender receiver
#> 1      8        5
#> 2      2        3
#> 3      3        6
#> 4      4        6
#> 5      9        1
#> 6      9        7

# the graph
g <- graph_from_data_frame(df2)

# You can get your general approach to work
# if the department codes are a named vector,
# where the names are the IDs. If this is the case
# a call like dept[c(ID1, ID3, ID2)] will ouput the
# department of individual 1, 3 and 2 in that order
named.dept <- df1$dept
names(named.dept) <- df1$ID

# To see how it works
named.dept[df2$sender]
#>   8   2   3   4   9   9 
#> "B" "B" "C" "A" "C" "C"

# Now using your code
E(g)$internal <- as.numeric(named.dept[df2$sender] == named.dept[df2$receiver])
E(g)$internal
#> [1] 1 0 1 0 0 0

Referring back, we see that edges 8->5 and 3->6 are internal because 8 and 5 are both in 'B' and 3 and 6 are both in 'C', so we are doing what we intended to.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM