简体   繁体   English

如何识别所有大写字母中第一个单词的位置而忽略少于 5 个字符的单词[R]?

[英]How can I identify the location of the first word in all caps ignoring words less than 5 characters[R]?

I want to identify the location of the first word in all caps ignoring words less than 5 characters in R.我想识别全部大写的第一个单词的位置,忽略 R 中少于 5 个字符的单词。

Example data:示例数据:

myvec <- c("FILT Words Here before CAPITALS words here after","Words OUT ALLCAPS words MORECAPS words after","Mywords PRE Before CAPLET more words after the capital Letters CAPLETTERS","PRE CAP letters SPLIT here not before")

Desired results:期望的结果:

desired_first_word_over4char_caps <- c(5,3,4,4)
desired_first_word_over4char <- c("CAPITALS","ALLCAPS","CAPLET","SPLIT")

Using regular expressions:使用正则表达式:

> strsplit(myvec, " ") |> lapply(grep, pattern = "^[A-Z]{5,}$") |> sapply(min)
[1] 5 3 4 4
> strsplit(myvec, " ") |> lapply(grep, pattern = "^[A-Z]{5,}$", value=TRUE) |> sapply(head, 1)
[1] "CAPITALS" "ALLCAPS"  "CAPLET"   "SPLIT" 
words <- strsplit(myvec, ' ')
desired_first_word_over4char_caps <- vapply(words, \(x) grep('^[A-Z]{5,}$', x)[1L], integer(1L))
desired_first_word_over4char <- mapply(`[`, words, desired_first_word_over4char_caps)

Here is a one liner that creates a named vector with words/positions, ie这是一个创建带有单词/位置的命名向量的单行代码,即

 mapply(match, stringr::str_extract(myvec, '\\b[A-Z]{5,}\\b'), strsplit(myvec, ' '))

CAPITALS  ALLCAPS   CAPLET    SPLIT 
       5        3        4        4 

You can bring the output to user desired format, ie您可以将 output 转换为用户所需的格式,即

res <- mapply(match, stringr::str_extract(myvec, '\\b[A-Z]{5,}\\b'), strsplit(myvec, ' '))

names(res)
#[1] "CAPITALS" "ALLCAPS"  "CAPLET"   "SPLIT"   

unname(res)
#[1] 5 3 4 4

data.frame(position = res)
#         position
#CAPITALS        5
#ALLCAPS         3
#CAPLET          4
#SPLIT           4

This uses strcapture to create a data frame with column char and then adds a length column, char_caps .这使用strcapture创建一个包含列char的数据框,然后添加一个长度列char_caps No packages are used.没有使用包。

myvec |>
  strcapture("\\b([A-Z]{5,})\\b", x = _, data.frame(char = character(0))) |>
  transform(char_caps = nchar(char))
##       char char_caps
## 1 CAPITALS         8
## 2  ALLCAPS         7
## 3   CAPLET         6
## 4    SPLIT         5

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何识别 R 中组中不同列中第一行的值低于第一行的第一行? - How can I identify the first row with value lower than the first row in different column in groups in R? 如何删除R中没有大写字母的单词? - How to remove words not in caps in R? 如何使用文本挖掘包将单词(比如森林,森林)识别为R中的“森林”或“森林”? - How can I identify words (say forest, forests) as one word either “Forest” or “Forests” in R using text mining package? 如何获取 R 中的单词列表并计算每个单词的字符数并将计数频率存储在数组中? - How do I take a list of words in R and count the number of characters per word and store the frequency of counts in an array? 如何用 R 数据框中的数字替换所有单词实例? - How can I replace all instances of words with numbers in an R dataframe? 如何在R中创建用户输入的小于或大于(或等于)的变量? - How can I create user-inputted less than or greater than (or equal to) variables in R? R - 如何使向量列表中小于零的所有值都等于 0? - R - How do I make all values within a list of vectors that are less than zero equal to 0? R:如何从一串单词中显示前n个字符 - R: how to display the first n characters from a string of words r 软件 - 识别表列中的对象,大于或小于 - r software - identify objects in table column, more than, or less than R中具有多个单词和特殊字符的词云 - Word cloud in R with multiple words and special characters
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM