[英]Ordering problems when using mutate with ifelse condition to date
I'm trying to use mutate to create a column that takes the value of one column up to a point and then uses cumprod
to fill the rest of the observations based on the values of another column. 我正在尝试使用mutate创建一列,该列将某一列的值取到一个点,然后使用
cumprod
根据另一列的值填充其余的观察值。
I tried combining mutate
with ifelse
but the order of the statements is not correct and I can't figure out why 我尝试将
mutate
与ifelse
结合使用,但是语句的顺序不正确,我不知道为什么
Below I reproduce a more basic example that replicates my problem: 下面,我重现了一个更基本的示例,该示例重复了我的问题:
foo1 <- data.frame(date=seq(2005,2018,1))
foo1 %>% mutate(h=ifelse(date>2008, seq(1,11,1), 99))
The output is: 输出为:
date h
1 2005 99
2 2006 99
3 2007 99
4 2008 99
5 2009 5
6 2010 6
7 2011 7
8 2012 8
9 2013 9
10 2014 10
11 2015 1
12 2016 2
13 2017 3
14 2018 4
And I'd like it to be: 我希望它是:
date h
1 2005 99
2 2006 99
3 2007 99
4 2008 99
5 2009 1
6 2010 2
7 2011 3
8 2012 4
9 2013 5
10 2014 6
11 2015 7
12 2016 8
13 2017 9
14 2018 10
Edit: 编辑:
Below I reproduce another example (more close to what I'm trying to do). 下面,我重现另一个示例(与我要尝试的操作更接近)。
foo2 <- data.frame(date=seq(2005,2013,1), a=seq(1, by=1, length.out = 9), b=rep(1.01, length.out = 9))
foo2 %>% mutate(h=ifelse(date>2008, cumprod(c(a[5],b[5:9])), a))
The output I have is: 我的输出是:
date a b h
1 2005 1 1.01 1.00000
2 2006 2 1.01 2.00000
3 2007 3 1.01 3.00000
4 2008 4 1.01 4.00000
5 2009 5 1.01 5.20302
6 2010 6 1.01 5.25505
7 2011 7 1.01 5.00000
8 2012 8 1.01 5.05000
9 2013 9 1.01 5.10050
And I'd like it to be: 我希望它是:
date a b h
1 2005 1 1.01 1.00000
2 2006 2 1.01 2.00000
3 2007 3 1.01 3.00000
4 2008 4 1.01 4.00000
5 2009 5 1.01 5.00000
6 2010 6 1.01 5.05000
7 2011 7 1.01 5.10050
8 2012 8 1.01 5.20302
9 2013 9 1.01 5.25505
If I use if_else instead of ifelse
, I receive the following error: 如果我使用if_else而不是
ifelse
, ifelse
收到以下错误:
Error in mutate_impl(.data, dots) :
Evaluation error: `true` must be length 9 (length of `condition`) or one, not 6
You were nearly there: 您快到了:
foo1 %>% mutate(h = if_else(date > 2008, cumsum(date > 2008), 99L))
# date h
#1 2005 99
#2 2006 99
#3 2007 99
#4 2008 99
#5 2009 1
#6 2010 2
#7 2011 3
#8 2012 4
#9 2013 5
#10 2014 6
#11 2015 7
#12 2016 8
#13 2017 9
#14 2018 10
PS. PS。 It's recommended to use
if_else
instead of base R's ifelse
. 建议使用
if_else
代替基R的ifelse
。
The ifelse
function takes three arguments: ifelse
函数采用三个参数:
test
: a logical
vector. test
: logical
向量。 Say that it has a length of N
. N
yes
: a vector. yes
:向量。 It can be of any length. N
, the vector is recycled/shortened to be of length N
N
,则向量将被循环/缩短为长度N
no
: same as yes
. no
:与yes
相同。 At the end of this preprocessing stage, you have 3 same length vectors. 在此预处理阶段的最后,您具有3个相同长度的向量。
ifelse
then builds the return value selecting the second vector or the third vector depending on test
. ifelse
然后根据test
选择第二个向量或第三个向量来构建返回值。
In your case we have: 就您而言,我们有:
test <- foo1$date>2008 #length: 14
yes <- seq(1,11,1) #length: 11
no <- 99 #length: 1
So, it needs to recycle both yes
and no
. 因此,它需要回收
yes
和no
。 You end up with something like: 您最终会得到类似:
test yes no
FALSE 1 99
FALSE 2 99
FALSE 3 99
FALSE 4 99
TRUE 5 99
TRUE 6 99
TRUE 7 99
TRUE 8 99
TRUE 9 99
TRUE 10 99
TRUE 11 99
TRUE 1 99
TRUE 2 99
TRUE 3 99
You see how the recycle works. 您将看到回收的工作方式。 Then, to build the return value,
ifelse
selects, in the order above, yes
elements if test
is TRUE
and no
elements otherwise. 然后,要构建返回值,
ifelse
按上述顺序选择test
为TRUE
yes
元素, ifelse
选择no
。 This explain why you have that return value. 这说明了为什么拥有该返回值。 It's not about
dplyr
of course. 当然,这与
dplyr
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.