Determine all characters present in a vector of strings

Question

Say I have the following dataframe consisting of two vectors containing character strings:

df <- data.frame(
      "ID"= c("1a", "1b", "1c", "1d"), 
      "Codes" = c("BX.MX|GX.WX", "MX.RX|BX.YX", "MX.OX|GX.GX", "MX.OX|YX.OX"),
      stringsAsFactors = FALSE)

I'd like a simple way to determine which characters have been used in a given vector. In other words, the output of such a function would reveal:

find.characters(df$Codes) # hypothetical function
[1] "B" "G" "M" "W" "X" "R" "Y" "O" "|" "."

find.characters(df$ID) # hypothetical function
[1] "1" "a" "b" "c" "d"

Answer 1

You can create a custom function to do this. The idea is to split the strings into individual characters ( strsplit(v1, '') ), output will be list . We can unlist it to make it a vector , then get the unique elements. But, this is not sorted yet. Based on the example showed, you may want to sort the letters and other characters differently. So, we use grep to index the 'LETTER' character, and use this to separately sort the subset of vectors and concatenate c( it together.

 find.characters <- function(v1){
  x1 <- unique(unlist(strsplit(v1, '')))
  indx <- grepl('[A-Z]', x1)
  c(sort(x1[indx]), sort(x1[!indx]))
 }

 find.characters(df$Codes)
 #[1] "B" "G" "M" "O" "R" "W" "X" "Y" "|" "."

 find.characters(df$ID)
 #[1] "1" "a" "b" "c" "d"

NOTE: Generally, I would use grepl('[A-Za-z]', x1) , but I didn't do that because the expected result for the 'ID' column is different.

Answer 2

find.characters<-function(x){
  unique(c(strsplit(split="",x),recursive = T))
}

Determine all characters present in a vector of strings

Question

2 answers

solution1
2 2015-08-05 07:45:14

solution2
1 2015-08-05 07:44:47

Determine all characters present in a vector of strings

Question

2 answers

solution1 2 2015-08-05 07:45:14

solution2 1 2015-08-05 07:44:47

solution1
2 2015-08-05 07:45:14

solution2
1 2015-08-05 07:44:47