简体   繁体   English

确定字符串向量中存在的所有字符

[英]Determine all characters present in a vector of strings

Say I have the following dataframe consisting of two vectors containing character strings:假设我有以下数据帧,由两个包含字符串的向量组成:

df <- data.frame(
      "ID"= c("1a", "1b", "1c", "1d"), 
      "Codes" = c("BX.MX|GX.WX", "MX.RX|BX.YX", "MX.OX|GX.GX", "MX.OX|YX.OX"),
      stringsAsFactors = FALSE)

I'd like a simple way to determine which characters have been used in a given vector.我想要一种简单的方法来确定给定向量中使用了哪些字符。 In other words, the output of such a function would reveal:换句话说,这样一个函数的输出将揭示:

find.characters(df$Codes) # hypothetical function
[1] "B" "G" "M" "W" "X" "R" "Y" "O" "|" "."

find.characters(df$ID) # hypothetical function
[1] "1" "a" "b" "c" "d"

You can create a custom function to do this.您可以创建自定义函数来执行此操作。 The idea is to split the strings into individual characters ( strsplit(v1, '') ), output will be list .这个想法是将字符串拆分为单个字符( strsplit(v1, '') ),输出将为list We can unlist it to make it a vector , then get the unique elements.我们可以unlist它以使其成为vector ,然后获取unique元素。 But, this is not sorted yet.但是,这还没有排序。 Based on the example showed, you may want to sort the letters and other characters differently.根据显示的示例,您可能希望sort字母和其他字符进行不同的sort So, we use grep to index the 'LETTER' character, and use this to separately sort the subset of vectors and concatenate c( it together.因此,我们使用grep来索引 'LETTER' 字符,并使用它来分别sort向量子集进行sort并将c(它连接在一起。

 find.characters <- function(v1){
  x1 <- unique(unlist(strsplit(v1, '')))
  indx <- grepl('[A-Z]', x1)
  c(sort(x1[indx]), sort(x1[!indx]))
 }

 find.characters(df$Codes)
 #[1] "B" "G" "M" "O" "R" "W" "X" "Y" "|" "."

 find.characters(df$ID)
 #[1] "1" "a" "b" "c" "d"

NOTE: Generally, I would use grepl('[A-Za-z]', x1) , but I didn't do that because the expected result for the 'ID' column is different.注意:通常,我会使用grepl('[A-Za-z]', x1) ,但我没有这样做,因为 'ID' 列的预期结果不同。

find.characters<-function(x){
  unique(c(strsplit(split="",x),recursive = T))
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM