[英]How can I identify the location of the first word in all caps ignoring words less than 5 characters[R]?
I want to identify the location of the first word in all caps ignoring words less than 5 characters in R.我想识别全部大写的第一个单词的位置,忽略 R 中少于 5 个字符的单词。
Example data:示例数据:
myvec <- c("FILT Words Here before CAPITALS words here after","Words OUT ALLCAPS words MORECAPS words after","Mywords PRE Before CAPLET more words after the capital Letters CAPLETTERS","PRE CAP letters SPLIT here not before")
Desired results:期望的结果:
desired_first_word_over4char_caps <- c(5,3,4,4)
desired_first_word_over4char <- c("CAPITALS","ALLCAPS","CAPLET","SPLIT")
Using regular expressions:使用正则表达式:
> strsplit(myvec, " ") |> lapply(grep, pattern = "^[A-Z]{5,}$") |> sapply(min)
[1] 5 3 4 4
> strsplit(myvec, " ") |> lapply(grep, pattern = "^[A-Z]{5,}$", value=TRUE) |> sapply(head, 1)
[1] "CAPITALS" "ALLCAPS" "CAPLET" "SPLIT"
words <- strsplit(myvec, ' ')
desired_first_word_over4char_caps <- vapply(words, \(x) grep('^[A-Z]{5,}$', x)[1L], integer(1L))
desired_first_word_over4char <- mapply(`[`, words, desired_first_word_over4char_caps)
Here is a one liner that creates a named vector with words/positions, ie这是一个创建带有单词/位置的命名向量的单行代码,即
mapply(match, stringr::str_extract(myvec, '\\b[A-Z]{5,}\\b'), strsplit(myvec, ' '))
CAPITALS ALLCAPS CAPLET SPLIT
5 3 4 4
You can bring the output to user desired format, ie您可以将 output 转换为用户所需的格式,即
res <- mapply(match, stringr::str_extract(myvec, '\\b[A-Z]{5,}\\b'), strsplit(myvec, ' '))
names(res)
#[1] "CAPITALS" "ALLCAPS" "CAPLET" "SPLIT"
unname(res)
#[1] 5 3 4 4
data.frame(position = res)
# position
#CAPITALS 5
#ALLCAPS 3
#CAPLET 4
#SPLIT 4
This uses strcapture
to create a data frame with column char
and then adds a length column, char_caps
.这使用
strcapture
创建一个包含列char
的数据框,然后添加一个长度列char_caps
。 No packages are used.没有使用包。
myvec |>
strcapture("\\b([A-Z]{5,})\\b", x = _, data.frame(char = character(0))) |>
transform(char_caps = nchar(char))
## char char_caps
## 1 CAPITALS 8
## 2 ALLCAPS 7
## 3 CAPLET 6
## 4 SPLIT 5
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.