简体   繁体   English

在 R 中创建单词变体

[英]Create word variations in R

I have an assigment in which I have absolutely no idea how to start to make it work.我有一个任务,我完全不知道如何开始让它工作。

I have to create variations of list of words, where will be replaced every character (between 1st and last) with '*' on different positions.我必须创建单词列表的变体,其中每个字符(第一个和最后一个之间)将在不同位置用“*”替换。

It should look something like this:它应该看起来像这样:

input: c('smog', 'sting')输入:c('烟雾','刺痛')

desired output: 's*og', 'sm*g', 's**g', 's*ing', 'st*ng', 'sti*g', 's***g'所需 output:'s*og'、'sm*g'、's**g'、's*ing'、'st*ng'、'sti*g'、's***g'

Any idea how to achieve something like this?知道如何实现这样的目标吗?

Thank you very much非常感谢你

UPDATE I've found this solution:更新我找到了这个解决方案:

s <- c( 'smog')
f <- function(x,y) {substr(x,y,y) <- "*"; x}
g <- function(x) Reduce(f,x,s)
unlist(lapply(1:(nchar(s)-2),function(x) combn(2:(nchar(s)-1),x,g)))

output:
[1] "s*og" "sm*g" "s**g"

the only problem with this is, that it works only when there is one word in the string, not several唯一的问题是,它仅在字符串中有一个单词而不是多个单词时起作用

See also this SO post for related techniques: Create all combinations of letter substitution in string另请参阅此 SO 帖子了解相关技术: Create all combinations of letter substitution in string

EDIT编辑

From the OP edit and comment:从 OP 编辑和评论:

repfun2 <- function(s){
  f <- function(x,y) {substr(x,y,y) <- "*"; x}
  g <- function(x) Reduce(f,x,s)
  out <- unlist(lapply(1:(nchar(s)-2),function(x) combn(2:(nchar(s)-1),x,g)))
  return(out)
}
lapply(test2, FUN = repfun2)

Ouput:输出:

> lapply(test2, FUN = repfun2)
[[1]]
[1] "s*og" "sm*g" "s**g"

[[2]]
[1] "s*ing" "st*ng" "sti*g" "s**ng" "s*i*g" "st**g" "s***g"

Previous answer for random replacement随机替换的先前答案

I understand you want a random replacement of characters in a vector of strings.我知道您想要随机替换字符串向量中的字符。 If this is correct, here is an idea:如果这是正确的,这里有一个想法:

test2 <- c('smog', 'sting')

repfun <- function(.string) {
  n_char <- nchar(.string)
  # random selection of n characters that will be replaced in the string
  repchar <- sample(1:n_char, size = sample(1:n_char, size = 1))
  # replacing the characters in the string
  for(i in seq_along(repchar)) substring(.string, repchar[i], repchar[i]) <- "*"
  return(.string)
}
lapply(test2, FUN = repfun)

Some outputs:一些输出:

> lapply(test2, FUN = repfun)
[[1]]
[1] "*mog"

[[2]]
[1] "s*ing"

> lapply(test2, FUN = repfun)
[[1]]
[1] "s*o*"

[[2]]
[1] "s*i*g"

The basic idea is:基本思想是:

  1. Determine the number of characters in a string,确定字符串中的字符数,
  2. Randomly sample it based on its length,根据它的长度随机采样,
  3. Replace the randomly sampled characters by "*"用“*”替换随机采样的字符
  4. Use lapply to pass a vector of character strings.使用lapply传递字符串向量。

I think you can improve it by removing the for loop if needed, see some ideas here and here我认为您可以根据需要通过删除for循环来改进它,请在此处此处查看一些想法

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM