简体   繁体   中英

Finding strings that are a certain length and contain specific characters

Sample data

a<-c("hour","four","ruoh", "six", "high", "our")

I want to find all strings that contain o & u & h & are 4 characters but the order does not matter.

I want to return "hour","four","ruoh" this is my attempt

grepl("o+u+r", a) nchar(a)==4

To match strings of length 4 containing the characters h , o , and u use:

grepl("(?=^.{4}$)(?=.*h)(?=.*o)(?=.*u)",
      c("hour","four","ruoh", "six", "high", "our"),
      perl = TRUE)
[1]  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
  • (?=^.{4}$) : string has length 4.
  • (?=.*x) : x occurs at any position in string.

You could use strsplit and setdiff , I added an additional edge case to your sample data :

a<-c("hour","four","ruoh", "six", "high", "our","oouh")
a[nchar(a) == 4 &
  lengths(lapply(strsplit(a,""),function(x) setdiff(x, c("o","u","h")))) == 1]
# [1] "hour" "ruoh"

or grepl :

a[nchar(a) == 4 & !rowSums(sapply(c("o","u","h"), Negate(grepl), a))]
# [1] "hour" "ruoh" "oouh"

sapply(c("o","u","h"), Negate(grepl), a) gives you a matrix of which word doesn't contain each letter, then the rowSums acts like any applied by row, as it will be coerced to logical.

Using grepl with your edited method (r instead of h):

a<-c("hour","four","ruoh", "six", "high", "our")

a[grepl(pattern="o", x=a) & grepl(pattern="u", x=a) & grepl(pattern="r", x=a) & nchar(a)==4]

Returns:

[1] "hour" "four" "ruoh"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM