简体   繁体   English

用随机单词替换每个字母序列

[英]Replace every sequence of letters with a random word

I created a list of random words:我创建了一个随机单词列表:

library(OpenRepGrid)
list_of_words <- randomWords(100)
list_of_words <- gsub("[^A-Za-z ]", "", list_of_words)
list_of_words <- list_of_words[nchar(list_of_words) %in% 4:6]
list_of_words <- list_of_words[!(duplicated(list_of_words)|duplicated(list_of_words, fromLast=TRUE))]

And I have a string as follows:我有一个字符串如下:

dat_string <- "Code bla-group Description bla-groep somecoëfficiënt\nP1 building 0,325\nN2111 veggies 0,387"

I would like to replace all sets of consecutive letters ( Code, bla, Description, ... ) with a random word of the list_of_words .我想用list_of_words的随机单词替换所有连续字母组( Code, bla, Description, ... )。

I thought of doing:我想这样做:

dat_string <- gsub("[:alpha:]",sample(list_of_words),dat_string) 

But the output is a bit unexpected;但是output有点意外;

"Code bHarryHarry-grouHarry DescriHarrytion bHarryHarry-groeHarry somecoëfficiënt\nP1 buiHarryding 0,325\nN2111 veggies 0,387"

Could anyone explain to me what I am doing wrong here?谁能向我解释我在这里做错了什么?

You can use您可以使用

library(stringr)
str_replace_all(dat_string, "\\p{L}+", function(x) sample(list_of_words, 1))

Here, \p{L}+ matches one or more Unicode letters (thus matching any word) and then the word is replaced by a random element from the list_of_words character vector.此处, \p{L}+匹配一个或多个 Unicode 个字母(因此匹配任何单词),然后该单词被list_of_words字符向量中的随机元素替换。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM