[英]Create a matrix using the common information in two lists
I have two large lists in the same structure of the toy examples shown in this question. 我在这个问题所示的玩具示例的相同结构中有两个大列表。
dput(head(list1)): dput(head(list1)):
list(FEB_GAMES = c(GAME1 = c("Stan", "Kenny", "Cartman", "Kyle",
"Butters"), GAME2 = c("Kenny", "Cartman", "Kyle", "Butters")),
MAR_GAMES = c(GAME3 = c("Stan", "Kenny", "Cartman", "Butters"
), GAME4 = c("Kenny", "Cartman", "Kyle", "Butters")))
dput(head(list2)): dput(head(list2)):
list(first = c("Stan", "Kenny", "Cartman", "Kyle", "Butters",
"Kenny", "Cartman", "Kyle", "Butters"), second = c("Stan", "Kenny",
"Cartman", "Wendy", "Ike"), third = c("Randy", "Randy", "Randy",
"Randy"))
I would like to turn these two lists into one large data.frame/ matrix. 我想将这两个列表变成一个大的data.frame /矩阵。 The rownames would be from list1 (GAME1, GAME2, GAME3, GAME4).
行名将来自列表1(GAME1,GAME2,GAME3,GAME4)。 The colnames would be the list names of list 2 (first, second, third).
别名将是列表2(第一,第二,第三)的列表名称。 The information in the matrix would be an integer which refers to the number of times a common character is found in both list.
矩阵中的信息将是一个整数,它表示在两个列表中找到一个公共字符的次数。 eg GAME1xfirst contains 9 common characters, while GAME1xthird contains 0.
例如,GAME1xfirst包含9个公共字符,而GAME1xthird包含0个公共字符。
The output would look like this: 输出如下所示:
first second third
GAME1 9 3 0
GAME2 8 2 0
GAME3 8 3 0
GAME4 8 2 0
So the values in [1,1] would be the sum of the times a common character is found in both the GAME1 list from list 1 and the first list found in list2. 因此,[1,1]中的值将是在列表1的GAME1列表和列表2中的第一个列表中找到一个公共字符的时间总和。
Note. 注意。 Lists in both list 1 and list 2 have varying numbers of values.
列表1和列表2中的列表具有不同数量的值。
An option would be to first flatten out the 'list1', do a merge
after converting to data.frame
and then do the table
一种选择是首先展平“ list1”,转换为
data.frame
之后进行merge
,然后执行table
list1a <- do.call(c, list1)
names(list1a) <- sub(".*\\.", "", names(list1a))
out <- table(merge(stack(list1a), stack(list2), by = 'values')[-1])
names(dimnames(out)) <- NULL
out
# first second third
#GAME1 9 3 0
#GAME2 8 2 0
#GAME3 7 3 0
#GAME4 8 2 0
We can also do this in tidyverse
using the same logic 我们也可以使用相同的逻辑在
tidyverse
执行此tidyverse
library(tidyverse)
list1 %>%
flatten %>%
enframe %>%
unnest %>%
full_join(list2 %>%
enframe %>%
unnest, by = 'value') %>%
select(-value) %>%
count(name.x, name.y) %>%
spread(name.y, n, fill = 0) %>%
filter(!is.na(name.x))
# A tibble: 4 x 4
# name.x first second third
# <chr> <dbl> <dbl> <dbl>
#1 GAME1 9 3 0
#2 GAME2 8 2 0
#3 GAME3 7 3 0
#4 GAME4 8 2 0
list1 <- list(FEB_games = list(GAME1 = c("Stan", "Kenny", "Cartman", "Kyle",
"Butters"), GAME2 = c("Kenny", "Cartman", "Kyle", "Butters")),
MAR_games = list(GAME3 = c("Stan", "Kenny", "Cartman", "Butters"
), GAME4 = c("Kenny", "Cartman", "Kyle", "Butters")))
list2 <- list(first = c("Stan", "Kenny", "Cartman", "Kyle", "Butters",
"Kenny", "Cartman", "Kyle", "Butters"), second = c("Stan", "Kenny",
"Cartman", "Wendy", "Ike"), third = c("Randy", "Randy", "Randy",
"Randy"))
How about ... 怎么样 ...
sapply(l2, function(x) {
sapply(unlist(l1, recursive = FALSE), function(y) sum(x %in% y))
})
# first second third
# FEB_games.GAME1 9 3 0
# FEB_games.GAME2 8 2 0
# MAR_games.GAME3 7 3 0
# MAR_games.GAME4 8 2 0
Might not be not the most efficient approach, though. 但是,可能不是最有效的方法。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.