简体   繁体   English

基于字符串向量比较R中的两个字符向量

[英]Compare two character vectors in R based on vector of strings

I have two lists A and B .我有两个lists AB The dates in A are 2000 - 2022 while those in B are 2023-2030 . A中的dates2000 - 2022 ,而B中的日期是2023-2030

names(A) and names(B) give the follow character vectors: names(A)names(B)给出以下字符向量:

a <- c("ACC_a_his", "BCC_b_his", "Can_c_his", "CES_d_his")
b <- c("ACC_a_fu", "BCC_b_fu", "Can_c_fu", "CES_d_fu","FGO_c_fu")

Also, I have a string vector, c which is common across the names in a and b :另外,我有一个字符串向量c ,它在ab中的名称中很常见:

c=c("ACC","BCC", "Can", "CES", "FGO")

Note that the strings in c do not always appear in the same position in filenames.请注意, c中的字符串并不总是出现在文件名中的相同位置。 The string can be at the beginning, middle or end of filenames.该字符串可以位于文件名的开头、中间或结尾。

Challenge挑战

  1. Using the strings in c I would like to get the difference (ie, which name exists in b but not in a or vice versa) between the names in a and b使用c中的字符串,我想得到ab中的名称之间的差异(即, b中存在哪个名称但 a 中不a ,反之亦然)

Expected output = "FGO_c_fu" Expected output = "FGO_c_fu"

  1. rbind (or whatever is best) matching dataframes in lists A and B if the names are similar based on string in c rbind (或最好的)匹配列表AB中的dataframes ,如果名称基于c中的字符串相似

Update: See OP's comment:更新:见OP的评论:

Try this:试试这个:

library(dplyr)
library(tibble)
library(tidyr)
library(stringr)
# or just library(tidyverse)

df %>% 
  pivot_longer(everything()) %>% 
  mutate(x = str_extract(value, paste(c, collapse = "|"))
         ) %>% 
  group_by(x) %>% 
  filter(!any(row_number() > 1)) %>% 
  na.omit() %>% 
  pull(value)

[1] "FGO_c_fu"

First answer: Here is an alternative approach:第一个答案:这是另一种方法:

  1. We create a list我们创建一个列表
  2. the vectors are of unequal length向量长度不等
  3. With data.frame(lapply(my_list, length<- , max(lengths(my_list)))) we create a data frame使用data.frame(lapply(my_list, length<- , max(lengths(my_list)))) we create a data frame
  4. pivot longer and group by all before the first underline在第一个下划线之前旋转更长的时间并按所有分组
  5. remove NA and filter:删除 NA 并过滤:
library(dplyr)
library(tidyr)
library(tibble)

my_list <- tibble::lst(a, b)
df <- data.frame(lapply(my_list, `length<-`, max(lengths(my_list)))) 
 
df %>% 
  pivot_longer(everything()) %>% 
  group_by(x = sub("\\_.*", "", value)) %>% 
  filter(!any(row_number() > 1)) %>% 
  na.omit() %>% 
  pull(value)
[1] "FGO_c_fu"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM