[英]Replace every sequence of letters with a random word
I created a list of random words:我创建了一个随机单词列表:
library(OpenRepGrid)
list_of_words <- randomWords(100)
list_of_words <- gsub("[^A-Za-z ]", "", list_of_words)
list_of_words <- list_of_words[nchar(list_of_words) %in% 4:6]
list_of_words <- list_of_words[!(duplicated(list_of_words)|duplicated(list_of_words, fromLast=TRUE))]
And I have a string as follows:我有一个字符串如下:
dat_string <- "Code bla-group Description bla-groep somecoëfficiënt\nP1 building 0,325\nN2111 veggies 0,387"
I would like to replace all sets of consecutive letters ( Code, bla, Description, ...
) with a random word of the list_of_words
.我想用list_of_words
的随机单词替换所有连续字母组( Code, bla, Description, ...
)。
I thought of doing:我想这样做:
dat_string <- gsub("[:alpha:]",sample(list_of_words),dat_string)
But the output is a bit unexpected;但是output有点意外;
"Code bHarryHarry-grouHarry DescriHarrytion bHarryHarry-groeHarry somecoëfficiënt\nP1 buiHarryding 0,325\nN2111 veggies 0,387"
Could anyone explain to me what I am doing wrong here?谁能向我解释我在这里做错了什么?
You can use您可以使用
library(stringr)
str_replace_all(dat_string, "\\p{L}+", function(x) sample(list_of_words, 1))
Here, \p{L}+
matches one or more Unicode letters (thus matching any word) and then the word is replaced by a random element from the list_of_words
character vector.此处, \p{L}+
匹配一个或多个 Unicode 个字母(因此匹配任何单词),然后该单词被list_of_words
字符向量中的随机元素替换。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.