[英]Lists or vectors within a column of a data frame, to be able to compare each value and count the matches with R project
'Lists or vectors within a column of a data frame, to be able to compare each value and count the matches with R project' '数据框列内的列表或向量,以便能够比较每个值并计算与 R 项目的匹配项'
Col1 ![]() |
Col2 ![]() |
---|---|
a![]() |
3,2,20,6 ![]() |
b ![]() |
3,0,20,15 ![]() |
a![]() |
3,2,20,50 ![]() |
b ![]() |
3,6,59,0 ![]() |
a![]() |
3,4,20,6 ![]() |
'I have two columns, Col2 is a character, but I need to transform it to a vector, example: c(“3”, ”2”, ”20”, ”6”, ”4”, ”64”, ”7”, ”65”, ”76”, ”26”, ”52”, ”67”, ”66”, ”76”, ”22”), to be able to apply %in% and obtain the number of True, example:' '我有两列,Col2是一个字符,但是我需要把它转换成一个向量,例如:c(“3”, ”2”, ”20”, ”6”, ”4”, ”64”, ” 7”、“65”、“76”、“26”、“52”、“67”、“66”、“76”、“22”),才能申请%in%并获得True的数量, 例子:'
'Table$Col3 <- Table$Col2[1] %in% Table$Col2 and get' 'Table$Col3 <- Table$Col2[1] %in% Table$Col2 并得到'
Col1 ![]() |
Col2 ![]() |
Col3 ![]() |
---|---|---|
a![]() |
3,2,20,6 ![]() |
|
b ![]() |
3,0,20,15 ![]() |
true, false, true, false![]() |
a![]() |
3,2,20,50 ![]() |
|
b ![]() |
3,6,59,0 ![]() |
true, false, false, false![]() |
a![]() |
3,4,20,6 ![]() |
true, false, true, false![]() |
'And finally count the number of True' '最后数一数真'
Col1 ![]() |
Col2 ![]() |
Col3 ![]() |
Col4 ![]() |
---|---|---|---|
a![]() |
3,2,20,6 ![]() |
||
b ![]() |
3,0,20,15 ![]() |
true, false, true, false![]() |
2 ![]() |
a![]() |
3,2,20,50 ![]() |
true, true, true, false![]() |
3 ![]() |
b ![]() |
3,6,59,0 ![]() |
true, false, false, false![]() |
1 ![]() |
a![]() |
3,4,20,6 ![]() |
true, false, true, false![]() |
2 ![]() |
'But I cannot transform Table$Col2[1] into a vector or list, I always get all the content between quotes “c(“3”, ”2”, ”20”, ”6”, ”4”, ”64” , ”7”, ”65”, ”76”, ”26”, ”52”, ”67”, ”66”, ”76”, ”22”)”, as a single value, in this way it compares the entire lists, not the values inside, not each value.' '但我无法将 Table$Col2[1] 转换为向量或列表,我总是得到引号之间的所有内容“c(“3”、“2”、“20”、“6”、“4”、“64 ”,“7”,“65”,“76”,“26”,“52”,“67”,“66”,“76”,“22”)”,作为单个值,以这种方式比较整个列表,而不是里面的值,而不是每个值。
'How can I solve that? '我该如何解决呢? It occurs to me that I could separate the values, creating more columns formatted as integers, and then join the values to create the vector or list, but I think that would be very inefficient.'
我突然想到,我可以分离这些值,创建更多格式化为整数的列,然后将这些值连接起来以创建向量或列表,但我认为这将非常低效。
We may extract the first list
element with [[
, loop over the list
column with map/imap
(from purrr
), create a logical vector with %in%
and get the count of TRUE values in Col4
by taking the sum of TRUE ( TRUE
-> 1 and FALSE
-> 0)我们可以用
[[
提取第一个list
元素,用map/imap
(来自purrr
)循环list
列,用%in%
创建一个逻辑向量, Col4
通过取 TRUE 的总和( TRUE
-> 1 和FALSE
-> 0)
library(purrr)
library(dplyr)
df1 %>%
mutate(Col3 = imap(Col2, ~ if(.y == 1) NA else .x %in% Col2[[1]]),
Col4 = map_dbl(Col3, ~ sum(.x, na.rm = TRUE)))
-output -输出
# A tibble: 5 × 4
Col1 Col2 Col3 Col4
<chr> <list> <list> <dbl>
1 a <dbl [4]> <lgl [1]> 0
2 b <dbl [4]> <lgl [4]> 2
3 a <dbl [4]> <lgl [4]> 3
4 b <dbl [4]> <lgl [4]> 2
5 a <dbl [4]> <lgl [4]> 3
df1 <- structure(list(Col1 = c("a", "b", "a", "b", "a"), Col2 = list(
c(3, 2, 20, 6), c(3, 0, 20, 15), c(3, 2, 20, 50), c(3, 6,
59, 0), c(3, 4, 20, 6))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -5L))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.