简体   繁体   English

使用 purrr 迭代替换数据框列中的字符串

[英]Using purrr to iteratively replace strings in a dataframe column

I would like to use purrr to iteratively run several string replacements on a dataframe column with the gsub() function.我想使用purrr使用gsub()函数在数据帧列上迭代运行多个字符串替换。

This is the example dataframe:这是示例数据框:

df <- data.frame(Year = "2019",
                 Text = c(rep("a aa", 5), 
                          rep("a bb", 3), 
                          rep("a cc", 2)))

> df
   Year Text
1  2019 a aa
2  2019 a aa
3  2019 a aa
4  2019 a aa
5  2019 a aa
6  2019 a bb
7  2019 a bb
8  2019 a bb
9  2019 a cc
10 2019 a cc

This is how I would normally run the string replacement, and the desired result.这就是我通常运行字符串替换的方式,以及所需的结果。

df$Text <- gsub("aa", "One", df$Text, fixed = T)
df$Text <- gsub("bb", "Two", df$Text, fixed = T)
df$Text <- gsub("cc", "Three", df$Text, fixed = T)

> df
   Year    Text
1  2019   a One
2  2019   a One
3  2019   a One
4  2019   a One
5  2019   a One
6  2019   a Two
7  2019   a Two
8  2019   a Two
9  2019 a Three
10 2019 a Three

However this is unrealistic to use as the list of string replacements grows, so I tried to use purrr to iterate such changes using a list of patterns and replacements but I've only managed to produce error messages.然而,随着字符串替换列表的增长,这是不现实的,因此我尝试使用purrr使用patternsreplacements列表迭代此类更改,但我只设法生成错误消息。 I expect the code to iterate through text_pattern and text_replacement and run gsub on df$Text for each pair of pattern/replacement.我希望代码遍历text_patterntext_replacement并在df$Text为每对模式/替换运行gsub The example is below along with the error messages.该示例与错误消息一起在下面。

text_pattern <- c("aa", "bb", "cc")
text_replacement <- c("One", "Two", "Three")

walk2(text_pattern, text_replacement, function(...){
  gsub(text_pattern, text_replacement, df$Text, fixed = F)
  }
)

Warning messages:
1: In gsub(text_former, text_replace, df$Text, fixed = F) :
  argument 'pattern' has length > 1 and only the first element will be used
2: In gsub(text_former, text_replace, df$Text, fixed = F) :
  argument 'replacement' has length > 1 and only the first element will be used
3: In gsub(text_former, text_replace, df$Text, fixed = F) :
  argument 'pattern' has length > 1 and only the first element will be used
4: In gsub(text_former, text_replace, df$Text, fixed = F) :
  argument 'replacement' has length > 1 and only the first element will be used
5: In gsub(text_former, text_replace, df$Text, fixed = F) :
  argument 'pattern' has length > 1 and only the first element will be used
6: In gsub(text_former, text_replace, df$Text, fixed = F) :
  argument 'replacement' has length > 1 and only the first element will be used

Is it possible to accomplish this using functions from purrr ?是否可以使用purrr函数来完成此操作? Or alternatively am I trying to use the wrong tool and is there a different function I should be using?或者,我是否尝试使用错误的工具,是否应该使用不同的功能?

We can use reduce2我们可以使用reduce2

library(purrr)
library(stringr)
df$Text <- reduce2(text_pattern, text_replacement, ~ str_replace(..1, ..2, ..3), 
           .init = df$Text)
df$Text
#[1] "a One"   "a One"   "a One"   "a One"   "a One"   "a Two"   "a Two"   "a Two"   "a Three" "a Three"

Or without using anonymous function call或者不使用匿名函数调用

reduce2(text_pattern, text_replacement, .init = df$Text, str_replace)

@akrun's answer is great, however there are a few intermediate points you may also find useful in understanding purrr better. @akrun 的回答很好,但是有一些中间点也可能有助于更好地理解purrr

  1. walk2 won't return output, it just returns the first input vector. walk2不会返回输出,它只返回第一个输入向量。

    From the docs :文档

    walk() calls .f for its side-effect and returns the input .x. walk() 为其副作用调用 .f 并返回输入 .x。

    The closest analog for what you're doing is map2 , but see below for why that's also not quite what you need.与您所做的最接近的模拟是map2 ,但请参阅下文了解为什么这也不是您所需要的。

  2. Arguments inside purrr functions like map and walk refer to generic representations of the vectors being iterated over. purrr函数(如mapwalk指代迭代的向量的通用表示。

    You have a couple of options for how to refer to input vectors.关于如何引用输入向量,您有多种选择。 One is to name the arguments in function(...) .一种是在function(...)命名参数。 For example, with function(x, y) then this will produce error-free output:例如,使用function(x, y)那么这将产生无错误的输出:

     map2(text_pattern, text_replacement, function(x, y){ gsub(x, y, df$Text, fixed = F) } ) # switching to map2() because walk2 gives silent output

    You can also use ~ syntax and then refer to the input iterables as .x and .y :您还可以使用~语法,然后将输入迭代引用为.x.y

     map2(text_pattern, text_replacement, ~gsub(.x, .y, df$Text, fixed = F))
  3. The output isn't what you are expecting.输出不是您所期望的。

    purrr methods like map and walk loop over the entire vector for each pattern. purrr方法如mapwalk循环遍历每个模式的整个向量。 The output for both of the code snippets in 2. is the following: 2. 中两个代码片段的输出如下:

     [[1]] [1] "a One" "a One" "a One" "a One" "a One" "a bb" "a bb" "a bb" "a cc" "a cc" [[2]] [1] "a aa" "a aa" "a aa" "a aa" "a aa" "a Two" "a Two" "a Two" "a cc" "a cc" [[3]] [1] "a aa" "a aa" "a aa" "a aa" "a aa" "a bb" "a bb" "a bb" [9] "a Three" "a Three"

    So even fixing the syntax, you're still getting a three-element list, the contents of each element being the results of the replacement operation for each pair of text_pattern - text_replacement .因此,即使修复了语法,您仍然会得到一个三元素列表,每个元素的内容是每对text_pattern - text_replacement的替换操作的结果。 There's still a smush operation that needs to happen to bring them all together with just the replaced elements.仍然需要进行 smush 操作才能将它们与替换的元素结合在一起。 That's what @akrun's shift to reduce2 accomplishes.这就是@akrun 转向reduce2

    One additional note on reduce syntax - the arguments ..1 , ..2 , ..3 refer to the inputs on each iteration, and the use of .init makes the first argument ( ..1 ) equal to df$Text .上,还应注意reduce语法-参数..1..2..3是指在每次迭代的输入,以及使用.init使得第一个参数( ..1 )等于df$Text ..2 and ..3 are what, in the earlier examples of map2 , were .x and .y , respectively (ie pattern and replacement values). ..2..3是什么,在前面的例子map2 ,分别.x.y ,分别(即图案和替换值)。 See the reduce docs for more.请参阅reduce文档了解更多信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM