[英]Convert all letters from uppercase to lowercase in a sentence except words with all CAPS using R
I want to convert my data frame of text into lowercase but I do not want to convert the words with all CAPS letters. 我想将我的文本数据框转换为小写字母,但我不想转换所有大写字母的单词。 For example if have a string like
例如,如果有一个类似
"My friEnd ENRIQUE is nOt GOoD in stuDies" “我的朋友是研究中的好东西”
The output of this should be like 这样的输出应该像
"my friend ENRIQUE is not good in studies" “我的朋友ENRIQUE学习不好”
It converted everything to lowercase except words with all capital letters. 它将所有带有大写字母的单词转换为小写。 I need ar function to do this task.
我需要ar函数来完成此任务。
You can do this with gsub
and a (perl compatible) regular expression. 您可以使用
gsub
和(与perl兼容)正则表达式进行此操作。
gsub("(\\b\\w*[a-z]\\w*\\b)", "\\L\\1", String, perl=TRUE)
"my friend ENRIQUE is not good in studies"
Putting \\\\b
word boundaries insures that this operates on separate words. 放置
\\\\b
单词边界可确保它对单独的单词起作用。 [az]
picks the words that contain at least one lower case letter. [az]
选择包含至少一个小写字母的单词。 The \\\\w*
before and after [az]
matches any number (including zero) of "word characters" ie letters or numbers. [az]
之前和之后的\\\\w*
匹配任何数量(包括零个)的“单词字符”,即字母或数字。 The \\\\L
in the substitution pattern converts to lower case. 替换模式中的
\\\\L
转换为小写。
We can split the string into different words and then find out those words which contain any lower case letter [az]
and convert that word to lower case. 我们可以将字符串分成不同的单词,然后找出包含任何小写字母
[az]
单词,然后将该单词转换为小写字母。
word_vec <- strsplit(x, " ")[[1]]
ifelse(grepl('[a-z]', word_vec), tolower(word_vec), word_vec)
#[1] "my" "friend" "ENRIQUE" "is" "not" "good" "in" "studies"
To make it as a single string we can use paste0
with an empty collapse
argument. 为了使它成为单个字符串,我们可以使用带有空
collapse
参数的paste0
。
paste0(ifelse(grepl('[a-z]', word_vec), tolower(word_vec), word_vec), collapse = " ")
#[1] "my friend ENRIQUE is not good in studies"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.