数据框列中的列表或向量，以便能够比较每个值并计算与 R 项目的匹配项

Question

'Lists or vectors within a column of a data frame, to be able to compare each value and count the matches with R project' '数据框列内的列表或向量，以便能够比较每个值并计算与 R 项目的匹配项'

Col1 Col1	Col2 Col2
a一个	3,2,20,6 3,2,20,6
b b	3,0,20,15 3,0,20,15
a一个	3,2,20,50 3,2,20,50
b b	3,6,59,0 3,6,59,0
a一个	3,4,20,6 3,4,20,6

'I have two columns, Col2 is a character, but I need to transform it to a vector, example: c(“3”, ”2”, ”20”, ”6”, ”4”, ”64”, ”7”, ”65”, ”76”, ”26”, ”52”, ”67”, ”66”, ”76”, ”22”), to be able to apply %in% and obtain the number of True, example:' '我有两列，Col2是一个字符，但是我需要把它转换成一个向量，例如：c(“3”, ”2”, ”20”, ”6”, ”4”, ”64”, ” 7”、“65”、“76”、“26”、“52”、“67”、“66”、“76”、“22”），才能申请%in%并获得True的数量，例子：'

'Table$Col3 <- Table$Col2[1] %in% Table$Col2 and get' 'Table$Col3 <- Table$Col2[1] %in% Table$Col2 并得到'

Col1 Col1	Col2 Col2	Col3 Col3
a一个	3,2,20,6 3,2,20,6
b b	3,0,20,15 3,0,20,15	true, false, true, false真、假、真、假
a一个	3,2,20,50 3,2,20,50
b b	3,6,59,0 3,6,59,0	true, false, false, false真、假、假、假
a一个	3,4,20,6 3,4,20,6	true, false, true, false真、假、真、假

'And finally count the number of True' '最后数一数真'

Col1 Col1	Col2 Col2	Col3 Col3	Col4 Col4
a一个	3,2,20,6 3,2,20,6
b b	3,0,20,15 3,0,20,15	true, false, true, false真、假、真、假	2 2
a一个	3,2,20,50 3,2,20,50	true, true, true, false真、真、真、假	3 3
b b	3,6,59,0 3,6,59,0	true, false, false, false真、假、假、假	1 1
a一个	3,4,20,6 3,4,20,6	true, false, true, false真、假、真、假	2 2

'But I cannot transform Table$Col2[1] into a vector or list, I always get all the content between quotes “c(“3”, ”2”, ”20”, ”6”, ”4”, ”64” , ”7”, ”65”, ”76”, ”26”, ”52”, ”67”, ”66”, ”76”, ”22”)”, as a single value, in this way it compares the entire lists, not the values inside, not each value.' '但我无法将 Table$Col2[1] 转换为向量或列表，我总是得到引号之间的所有内容“c(“3”、“2”、“20”、“6”、“4”、“64 ”，“7”，“65”，“76”，“26”，“52”，“67”，“66”，“76”，“22”）”，作为单个值，以这种方式比较整个列表，而不是里面的值，而不是每个值。

'How can I solve that? '我该如何解决呢？ It occurs to me that I could separate the values, creating more columns formatted as integers, and then join the values to create the vector or list, but I think that would be very inefficient.'我突然想到，我可以分离这些值，创建更多格式化为整数的列，然后将这些值连接起来以创建向量或列表，但我认为这将非常低效。

Answer 1

We may extract the first list element with [[ , loop over the list column with map/imap (from purrr ), create a logical vector with %in% and get the count of TRUE values in Col4 by taking the sum of TRUE ( TRUE -> 1 and FALSE -> 0)我们可以用[[提取第一个list元素，用map/imap （来自purrr ）循环list列，用%in%创建一个逻辑向量， Col4通过取 TRUE 的总和（ TRUE -> 1 和FALSE -> 0)

library(purrr)
library(dplyr)
df1 %>%
   mutate(Col3 = imap(Col2, ~ if(.y == 1) NA else .x %in% Col2[[1]]), 
     Col4 = map_dbl(Col3, ~ sum(.x, na.rm = TRUE)))

-output -输出

# A tibble: 5 × 4
  Col1  Col2      Col3       Col4
  <chr> <list>    <list>    <dbl>
1 a     <dbl [4]> <lgl [1]>     0
2 b     <dbl [4]> <lgl [4]>     2
3 a     <dbl [4]> <lgl [4]>     3
4 b     <dbl [4]> <lgl [4]>     2
5 a     <dbl [4]> <lgl [4]>     3

data数据

df1 <- structure(list(Col1 = c("a", "b", "a", "b", "a"), Col2 = list(
    c(3, 2, 20, 6), c(3, 0, 20, 15), c(3, 2, 20, 50), c(3, 6, 
    59, 0), c(3, 4, 20, 6))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -5L))

数据框列中的列表或向量，以便能够比较每个值并计算与 R 项目的匹配项

问题描述

1 个解决方案

解决方案1
2 2022-07-05 16:45:40

data数据

数据框列中的列表或向量，以便能够比较每个值并计算与 R 项目的匹配项

问题描述

1 个解决方案

解决方案1 2 2022-07-05 16:45:40

data数据

解决方案1
2 2022-07-05 16:45:40