[英]Conditionally applying functions to grouped dataframes in R with magrittr, dplyr and purrr
I would like to use the succinctness of magrittr, dplyr and possibly purrr to split a large dataframe (with many variables of different types) by one variable x
and then apply different functions conditionally by x
to each group and row within a group to a second variable y
. 我想利用magrittr,dplyr和可能的purrr的简洁性,用一个变量x
拆分大型数据帧(具有许多不同类型的变量),然后按条件将x
应用于每个组,并将组中的行应用于第二个变量变量y
。
Take the dataframe df <- data.frame(a, b, x, c, d, y)
, where x
are factors ( foo
, bar
) and y
are numbers. 取数据帧df <- data.frame(a, b, x, c, d, y)
,其中x
是因子( foo
, bar
), y
是数字。 I can do what I have described inelegantly with an unpiped workflow thus: 因此,我可以使用非管道式工作流来出色地描述我所描述的内容:
df$y[df$x == "foo"] %<>% subtract(min(.))
df$y[df$x == "bar"] %<>% add(max(df$y[df$x == "foo"]))
I would like to rewrite this using dplyr and add it to a long pipe for df
, but all my attempts to combine mutate
, sapply
and do
have failed; 我想使用dplyr重写它,并将其添加到df
的长管道中,但是我所有尝试结合mutate
, sapply
和do
尝试都失败了; as have attempts to incorporate purrr with anonymous functions, by_slice
and dmap
. 尝试将purrr与匿名函数by_slice
和dmap
。
Many thanks in advance for the advice. 在此先非常感谢您的建议。
This is more dplyr
than magrittr
, but I think it's also more readable. 这比magrittr
更为dplyr
,但我认为它也更具可读性。 I'm a bit uncomfortable with %<>%
because it disrupts the linear structure of operations, and makes the code harder to read. 我对%<>%
有点不满意,因为它破坏了操作的线性结构,并使代码更难阅读。 So I just use %>%
here. 所以我只在这里使用%>%
。
An example dataframe that matches your description: 符合您的描述的示例数据框:
df <- data.frame(a = 'a',
b = 'b',
x = c("foo", "bar") ,
c = 'c',
d = 'd',
y = 1:6)
df
a b x c d y
1 a b foo c d 1
2 a b bar c d 2
3 a b foo c d 3
4 a b bar c d 4
5 a b foo c d 5
6 a b bar c d 6
Your code: 您的代码:
library(dplyr)
library(magrittr)
df$y[df$x == "foo"] %<>% subtract(min(.))
df
a b x c d y
1 a b foo c d 0
2 a b bar c d 2
3 a b foo c d 2
4 a b bar c d 4
5 a b foo c d 4
6 a b bar c d 6
df$y[df$x == "bar"] %<>% add(max(df$y[df$x == "foo"]))
df
a b x c d y
1 a b foo c d 0
2 a b bar c d 6
3 a b foo c d 2
4 a b bar c d 8
5 a b foo c d 4
6 a b bar c d 10
A dplyr
solution: dplyr
解决方案:
df %>%
mutate(y = ifelse(x == "foo", y - min(y), y)) %>%
mutate(y = ifelse(x == "bar", y + max(y[x == 'foo']), y))
a b x c d y
1 a b foo c d 0
2 a b bar c d 6
3 a b foo c d 2
4 a b bar c d 8
5 a b foo c d 4
6 a b bar c d 10
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.