使用magrittr，dplyr和purrr有条件地将函数应用于R中的分组数据帧

Question

I would like to use the succinctness of magrittr, dplyr and possibly purrr to split a large dataframe (with many variables of different types) by one variable x and then apply different functions conditionally by x to each group and row within a group to a second variable y . 我想利用magrittr，dplyr和可能的purrr的简洁性，用一个变量x拆分大型数据帧（具有许多不同类型的变量），然后按条件将x应用于每个组，并将组中的行应用于第二个变量变量y 。

Take the dataframe df <- data.frame(a, b, x, c, d, y) , where x are factors ( foo , bar ) and y are numbers. 取数据帧df <- data.frame(a, b, x, c, d, y) ，其中x是因子（ foo ， bar ）， y是数字。 I can do what I have described inelegantly with an unpiped workflow thus: 因此，我可以使用非管道式工作流来出色地描述我所描述的内容：

df$y[df$x == "foo"] %<>% subtract(min(.))
df$y[df$x == "bar"] %<>% add(max(df$y[df$x == "foo"]))

I would like to rewrite this using dplyr and add it to a long pipe for df , but all my attempts to combine mutate , sapply and do have failed; 我想使用dplyr重写它，并将其添加到df的长管道中，但是我所有尝试结合mutate ， sapply和do尝试都失败了； as have attempts to incorporate purrr with anonymous functions, by_slice and dmap . 尝试将purrr与匿名函数by_slice和dmap 。

Many thanks in advance for the advice. 在此先非常感谢您的建议。

Answer 1

This is more dplyr than magrittr , but I think it's also more readable. 这比magrittr更为dplyr ，但我认为它也更具可读性。 I'm a bit uncomfortable with %<>% because it disrupts the linear structure of operations, and makes the code harder to read. 我对%<>%有点不满意，因为它破坏了操作的线性结构，并使代码更难阅读。 So I just use %>% here. 所以我只在这里使用%>% 。

An example dataframe that matches your description: 符合您的描述的示例数据框：

df <- data.frame(a = 'a', 
                 b = 'b', 
                 x = c("foo", "bar") , 
                 c = 'c', 
                 d = 'd', 
                 y = 1:6) 
df
  a b   x c d y
1 a b foo c d 1
2 a b bar c d 2
3 a b foo c d 3
4 a b bar c d 4
5 a b foo c d 5
6 a b bar c d 6

Your code: 您的代码：

library(dplyr)
library(magrittr)
df$y[df$x == "foo"] %<>% subtract(min(.))

df
  a b   x c d y
1 a b foo c d 0
2 a b bar c d 2
3 a b foo c d 2
4 a b bar c d 4
5 a b foo c d 4
6 a b bar c d 6

df$y[df$x == "bar"] %<>% add(max(df$y[df$x == "foo"]))

df
  a b   x c d  y
1 a b foo c d  0
2 a b bar c d  6
3 a b foo c d  2
4 a b bar c d  8
5 a b foo c d  4
6 a b bar c d 10

A dplyr solution: dplyr解决方案：

df %>% 
  mutate(y = ifelse(x == "foo", y - min(y), y)) %>% 
  mutate(y = ifelse(x == "bar", y + max(y[x == 'foo']), y))

  a b   x c d  y
1 a b foo c d  0
2 a b bar c d  6
3 a b foo c d  2
4 a b bar c d  8
5 a b foo c d  4
6 a b bar c d 10

使用magrittr，dplyr和purrr有条件地将函数应用于R中的分组数据帧

问题描述

1 个解决方案

解决方案1
2 2016-05-22 12:34:56

使用magrittr，dplyr和purrr有条件地将函数应用于R中的分组数据帧

问题描述

1 个解决方案

解决方案1 2 2016-05-22 12:34:56

解决方案1
2 2016-05-22 12:34:56