简体   繁体   中英

Creating tidy data frame showing co-occurrence: three columns for co-occurrence network using data from a list of uneven character vectors

I need help finding a solution to structure data for use with the r network package?

I have a list, author_list, containing several authors per character vector, eg:

document_authors1 = c("King, Stephen", "Martin, George", "Clancy, Tom")

document_authors2 = c("Clancy, Tom", "Patterson, James", "Stine, RL", "King, Stephen")

document_authors3 = c("Clancy, Tom", "Patterson, James", "Stine, RL", "King, Stephen")

author_list = list(document_authors1, document_authors2, document_authors3)

author_list

[[1]] [1] "King, Stephen" "Martin, George" "Clancy, Tom"

[[2]] [1] "Clancy, Tom" "Patterson, James" "Stine, RL" "King, Stephen"

[[3]] [1] "Clancy, Tom" "Patterson, James" "Stine, RL" "King, Stephen"

I need to create a data frame based on author_list within which there are three columns. The first two columns have the author names where col1 has a row value of one author and col2 has a row value of another author, and the third column, called, co-occurrence, provides the frequency by which the author pair (col1 and col2, row 1) occur. For example,

      col1                     col2                            co-occurrence
1 King, Stephen           Patterson, James                           2
2 Martin, George             Clancy, Tom                             1

Etc…

I have been trying to find a function from a package to do this but no luck. I've also been trying to piece together a solution step-by-step but this appears to be alluding me. Hopefully it's easier than I think. Any advice or suggestions would be greatly appreciated.

I am not entirely sure this is what you are interested in, but hope this will be helpful.

library(dplyr)

# Only include elements in list with more than one author
author_list <- author_list[lengths(author_list)>1]

# Identify every combination of pairs of authors for each element in list
mat <- do.call(rbind, lapply(1:length(author_list), function(x) t(combn(author_list[[x]],2))))

# Within each row sort alphabetically 
mat <- t(apply(mat, 1, sort))

# Count up pairs of authors
as.data.frame(mat) %>%
  group_by_all() %>%
  summarise(count = n())

# A tibble: 8 x 3
# Groups:   V1 [3]
  V1               V2               count
  <fct>            <fct>            <int>
1 Clancy, Tom      King, Stephen        3
2 Clancy, Tom      Martin, George       1
3 Clancy, Tom      Patterson, James     2
4 Clancy, Tom      Stine, R.L.          2
5 King, Stephen    Martin, George       1
6 King, Stephen    Patterson, James     2
7 King, Stephen    Stine, R.L.          2
8 Patterson, James Stine, R.L.          2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM