简体   繁体   English

数据框列中的列表或向量,以便能够比较每个值并计算与 R 项目的匹配项

[英]Lists or vectors within a column of a data frame, to be able to compare each value and count the matches with R project

'Lists or vectors within a column of a data frame, to be able to compare each value and count the matches with R project' '数据框列内的列表或向量,以便能够比较每个值并计算与 R 项目的匹配项'

Col1 Col1 Col2 Col2
a一个 3,2,20,6 3,2,20,6
b b 3,0,20,15 3,0,20,15
a一个 3,2,20,50 3,2,20,50
b b 3,6,59,0 3,6,59,0
a一个 3,4,20,6 3,4,20,6

'I have two columns, Col2 is a character, but I need to transform it to a vector, example: c(“3”, ”2”, ”20”, ”6”, ”4”, ”64”, ”7”, ”65”, ”76”, ”26”, ”52”, ”67”, ”66”, ”76”, ”22”), to be able to apply %in% and obtain the number of True, example:' '我有两列,Col2是一个字符,但是我需要把它转换成一个向量,例如:c(“3”, ”2”, ”20”, ”6”, ”4”, ”64”, ” 7”、“65”、“76”、“26”、“52”、“67”、“66”、“76”、“22”),才能申请%in%并获得True的数量, 例子:'

'Table$Col3 <- Table$Col2[1] %in% Table$Col2 and get' 'Table$Col3 <- Table$Col2[1] %in% Table$Col2 并得到'

Col1 Col1 Col2 Col2 Col3 Col3
a一个 3,2,20,6 3,2,20,6
b b 3,0,20,15 3,0,20,15 true, false, true, false真、假、真、假
a一个 3,2,20,50 3,2,20,50
b b 3,6,59,0 3,6,59,0 true, false, false, false真、假、假、假
a一个 3,4,20,6 3,4,20,6 true, false, true, false真、假、真、假

'And finally count the number of True' '最后数一数真'

Col1 Col1 Col2 Col2 Col3 Col3 Col4 Col4
a一个 3,2,20,6 3,2,20,6
b b 3,0,20,15 3,0,20,15 true, false, true, false真、假、真、假 2 2
a一个 3,2,20,50 3,2,20,50 true, true, true, false真、真、真、假 3 3
b b 3,6,59,0 3,6,59,0 true, false, false, false真、假、假、假 1 1
a一个 3,4,20,6 3,4,20,6 true, false, true, false真、假、真、假 2 2

'But I cannot transform Table$Col2[1] into a vector or list, I always get all the content between quotes “c(“3”, ”2”, ”20”, ”6”, ”4”, ”64” , ”7”, ”65”, ”76”, ”26”, ”52”, ”67”, ”66”, ”76”, ”22”)”, as a single value, in this way it compares the entire lists, not the values inside, not each value.' '但我无法将 Table$Col2[1] 转换为向量或列表,我总是得到引号之间的所有内容“c(“3”、“2”、“20”、“6”、“4”、“64 ”,“7”,“65”,“76”,“26”,“52”,“67”,“66”,“76”,“22”)”,作为单个值,以这种方式比较整个列表,而不是里面的值,而不是每个值。

'How can I solve that? '我该如何解决呢? It occurs to me that I could separate the values, creating more columns formatted as integers, and then join the values to create the vector or list, but I think that would be very inefficient.'我突然想到,我可以分离这些值,创建更多格式化为整数的列,然后将这些值连接起来以创建向量或列表,但我认为这将非常低效。

We may extract the first list element with [[ , loop over the list column with map/imap (from purrr ), create a logical vector with %in% and get the count of TRUE values in Col4 by taking the sum of TRUE ( TRUE -> 1 and FALSE -> 0)我们可以用[[提取第一个list元素,用map/imap (来自purrr )循环list列,用%in%创建一个逻辑向量, Col4通过取 TRUE 的总和( TRUE -> 1 和FALSE -> 0)

library(purrr)
library(dplyr)
df1 %>%
   mutate(Col3 = imap(Col2, ~ if(.y == 1) NA else .x %in% Col2[[1]]), 
     Col4 = map_dbl(Col3, ~ sum(.x, na.rm = TRUE)))

-output -输出

# A tibble: 5 × 4
  Col1  Col2      Col3       Col4
  <chr> <list>    <list>    <dbl>
1 a     <dbl [4]> <lgl [1]>     0
2 b     <dbl [4]> <lgl [4]>     2
3 a     <dbl [4]> <lgl [4]>     3
4 b     <dbl [4]> <lgl [4]>     2
5 a     <dbl [4]> <lgl [4]>     3

data数据

df1 <- structure(list(Col1 = c("a", "b", "a", "b", "a"), Col2 = list(
    c(3, 2, 20, 6), c(3, 0, 20, 15), c(3, 2, 20, 50), c(3, 6, 
    59, 0), c(3, 4, 20, 6))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -5L))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM