如何识别所有大写字母中第一个单词的位置而忽略少于 5 个字符的单词[R]？

Question

I want to identify the location of the first word in all caps ignoring words less than 5 characters in R.我想识别全部大写的第一个单词的位置，忽略 R 中少于 5 个字符的单词。

Example data:示例数据：

myvec <- c("FILT Words Here before CAPITALS words here after","Words OUT ALLCAPS words MORECAPS words after","Mywords PRE Before CAPLET more words after the capital Letters CAPLETTERS","PRE CAP letters SPLIT here not before")

Desired results:期望的结果：

desired_first_word_over4char_caps <- c(5,3,4,4)
desired_first_word_over4char <- c("CAPITALS","ALLCAPS","CAPLET","SPLIT")

Answer 1

Using regular expressions:使用正则表达式：

> strsplit(myvec, " ") |> lapply(grep, pattern = "^[A-Z]{5,}$") |> sapply(min)
[1] 5 3 4 4

> strsplit(myvec, " ") |> lapply(grep, pattern = "^[A-Z]{5,}$", value=TRUE) |> sapply(head, 1)
[1] "CAPITALS" "ALLCAPS"  "CAPLET"   "SPLIT"

Answer 2

words <- strsplit(myvec, ' ')
desired_first_word_over4char_caps <- vapply(words, \(x) grep('^[A-Z]{5,}$', x)[1L], integer(1L))
desired_first_word_over4char <- mapply(`[`, words, desired_first_word_over4char_caps)

Answer 3

Here is a one liner that creates a named vector with words/positions, ie这是一个创建带有单词/位置的命名向量的单行代码，即

 mapply(match, stringr::str_extract(myvec, '\\b[A-Z]{5,}\\b'), strsplit(myvec, ' '))

CAPITALS  ALLCAPS   CAPLET    SPLIT 
       5        3        4        4

You can bring the output to user desired format, ie您可以将 output 转换为用户所需的格式，即

res <- mapply(match, stringr::str_extract(myvec, '\\b[A-Z]{5,}\\b'), strsplit(myvec, ' '))

names(res)
#[1] "CAPITALS" "ALLCAPS"  "CAPLET"   "SPLIT"   

unname(res)
#[1] 5 3 4 4

data.frame(position = res)
#         position
#CAPITALS        5
#ALLCAPS         3
#CAPLET          4
#SPLIT           4

Answer 4

This uses strcapture to create a data frame with column char and then adds a length column, char_caps .这使用strcapture创建一个包含列char的数据框，然后添加一个长度列char_caps 。 No packages are used.没有使用包。

myvec |>
  strcapture("\\b([A-Z]{5,})\\b", x = _, data.frame(char = character(0))) |>
  transform(char_caps = nchar(char))
##       char char_caps
## 1 CAPITALS         8
## 2  ALLCAPS         7
## 3   CAPLET         6
## 4    SPLIT         5

如何识别所有大写字母中第一个单词的位置而忽略少于 5 个字符的单词[R]？

问题描述

4 个解决方案

解决方案1
1 已采纳 2023-01-03 09:25:50

解决方案2
1 2023-01-03 09:25:56

解决方案3
1 2023-01-03 09:28:17

解决方案4
0 2023-01-03 11:11:52

如何识别所有大写字母中第一个单词的位置而忽略少于 5 个字符的单词[R]？

问题描述

4 个解决方案

解决方案1 1 已采纳 2023-01-03 09:25:50

解决方案2 1 2023-01-03 09:25:56

解决方案3 1 2023-01-03 09:28:17

解决方案4 0 2023-01-03 11:11:52

解决方案1
1 已采纳 2023-01-03 09:25:50

解决方案2
1 2023-01-03 09:25:56

解决方案3
1 2023-01-03 09:28:17

解决方案4
0 2023-01-03 11:11:52