简体   繁体   English

使用R将具有全部CAPS的单词除外,将句子中所有字母从大写转换为小写

[英]Convert all letters from uppercase to lowercase in a sentence except words with all CAPS using R

I want to convert my data frame of text into lowercase but I do not want to convert the words with all CAPS letters. 我想将我的文本数据框转换为小写字母,但我不想转换所有大写字母的单词。 For example if have a string like 例如,如果有一个类似

"My friEnd ENRIQUE is nOt GOoD in stuDies" “我的朋友是研究中的好东西”

The output of this should be like 这样的输出应该像

"my friend ENRIQUE is not good in studies" “我的朋友ENRIQUE学习不好”

It converted everything to lowercase except words with all capital letters. 它将所有带有大写字母的单词转换为小写。 I need ar function to do this task. 我需要ar函数来完成此任务。

You can do this with gsub and a (perl compatible) regular expression. 您可以使用gsub和(与perl兼容)正则表达式进行此操作。

gsub("(\\b\\w*[a-z]\\w*\\b)", "\\L\\1", String, perl=TRUE)
"my friend ENRIQUE is not good in studies"

Putting \\\\b word boundaries insures that this operates on separate words. 放置\\\\b单词边界可确保它对单独的单词起作用。 [az] picks the words that contain at least one lower case letter. [az]选择包含至少一个小写字母的单词。 The \\\\w* before and after [az] matches any number (including zero) of "word characters" ie letters or numbers. [az]之前和之后的\\\\w*匹配任何数量(包括零个)的“单词字符”,即字母或数字。 The \\\\L in the substitution pattern converts to lower case. 替换模式中的\\\\L转换为小写。

We can split the string into different words and then find out those words which contain any lower case letter [az] and convert that word to lower case. 我们可以将字符串分成不同的单词,然后找出包含任何小写字母[az]单词,然后将该单词转换为小写字母。

word_vec <- strsplit(x, " ")[[1]]
ifelse(grepl('[a-z]', word_vec), tolower(word_vec), word_vec)

#[1] "my"  "friend"  "ENRIQUE" "is"  "not"  "good"  "in"  "studies"

To make it as a single string we can use paste0 with an empty collapse argument. 为了使它成为单个字符串,我们可以使用带有空collapse参数的paste0

paste0(ifelse(grepl('[a-z]', word_vec), tolower(word_vec), word_vec), collapse = " ")

#[1] "my friend ENRIQUE is not good in studies"

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将数据帧中所有字符变量中的所有值从小写转换为大写 - Convert from lowercase to uppercase all values in all character variables in dataframe 使用 R 从以表达式结尾的句子中提取所有单词 - Extract all words from a sentence ending in an expression using R 计算R中单词的所有排列中的字母数量 - Counting the amount of letters in all permutations of words in R 如何从 R 中的每个句子中随机选择一个字母、2 个字母、3 个字母、...、最多字母的单词? - How do I choose a random letter, 2 letters, 3 letters, ..., words with the most letters from each sentence in R? R正则表达式可替换除句子标记,撇号和连字符以外的所有标点符号 - R regex to replace all punctuation except sentence markers, apostrophes and hyphens 如何仅过滤包含 R 中所有字符串中的所有大写字母的向量 - how to filter only vectors that contain all uppercase letters in all the strings in R 提取字符串中的所有单词和字母簇,然后使用 R 中的 gsub() 使每个单词成为一个单独的数据 - Extracting all words and clusters of letters in a string and then making each word a seperate piece of data using gsub() in R 检查句子中是否存在所有单词 - check if all words present in a sentence 如何删除R中的所有英语单词(特殊标点除外) - how to delete all English words, except special punctuation, in R 在R中使用正则表达式删除字符串中除指定单词以外的所有字符 - Use regex in R to delete all characters in a string except specified words
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM