简体   繁体   English

迄今使用具有ifelse条件的mutate时的排序问题

[英]Ordering problems when using mutate with ifelse condition to date

I'm trying to use mutate to create a column that takes the value of one column up to a point and then uses cumprod to fill the rest of the observations based on the values of another column. 我正在尝试使用mutate创建一列,该列将某一列的值取到一个点,然后使用cumprod根据另一列的值填充其余的观察值。

I tried combining mutate with ifelse but the order of the statements is not correct and I can't figure out why 我尝试将mutateifelse结合使用,但是语句的顺序不正确,我不知道为什么

Below I reproduce a more basic example that replicates my problem: 下面,我重现了一个更基本的示例,该示例重复了我的问题:

foo1 <- data.frame(date=seq(2005,2018,1))
foo1 %>% mutate(h=ifelse(date>2008, seq(1,11,1), 99))

The output is: 输出为:

   date  h
1  2005 99
2  2006 99
3  2007 99
4  2008 99
5  2009  5
6  2010  6
7  2011  7
8  2012  8
9  2013  9
10 2014 10
11 2015  1
12 2016  2
13 2017  3
14 2018  4

And I'd like it to be: 我希望它是:

   date  h
1  2005 99
2  2006 99
3  2007 99
4  2008 99
5  2009  1
6  2010  2
7  2011  3
8  2012  4
9  2013  5
10 2014  6
11 2015  7
12 2016  8
13 2017  9
14 2018 10

Edit: 编辑:

Below I reproduce another example (more close to what I'm trying to do). 下面,我重现另一个示例(与我要尝试的操作更接近)。

foo2 <- data.frame(date=seq(2005,2013,1), a=seq(1, by=1, length.out = 9), b=rep(1.01, length.out = 9))
foo2 %>% mutate(h=ifelse(date>2008, cumprod(c(a[5],b[5:9])), a))

The output I have is: 我的输出是:

  date a    b       h
1 2005 1 1.01 1.00000
2 2006 2 1.01 2.00000
3 2007 3 1.01 3.00000
4 2008 4 1.01 4.00000
5 2009 5 1.01 5.20302
6 2010 6 1.01 5.25505
7 2011 7 1.01 5.00000
8 2012 8 1.01 5.05000
9 2013 9 1.01 5.10050

And I'd like it to be: 我希望它是:

  date a    b       h
1 2005 1 1.01 1.00000
2 2006 2 1.01 2.00000
3 2007 3 1.01 3.00000
4 2008 4 1.01 4.00000
5 2009 5 1.01 5.00000
6 2010 6 1.01 5.05000
7 2011 7 1.01 5.10050
8 2012 8 1.01 5.20302
9 2013 9 1.01 5.25505

If I use if_else instead of ifelse , I receive the following error: 如果我使用if_else而不是ifelseifelse收到以下错误:

Error in mutate_impl(.data, dots) : 
  Evaluation error: `true` must be length 9 (length of `condition`) or one, not 6

You were nearly there: 您快到了:

foo1 %>% mutate(h = if_else(date > 2008, cumsum(date > 2008), 99L))
#   date  h
#1  2005 99
#2  2006 99
#3  2007 99
#4  2008 99
#5  2009  1
#6  2010  2
#7  2011  3
#8  2012  4
#9  2013  5
#10 2014  6
#11 2015  7
#12 2016  8
#13 2017  9
#14 2018 10

PS. PS。 It's recommended to use if_else instead of base R's ifelse . 建议使用if_else代替基R的ifelse

The ifelse function takes three arguments: ifelse函数采用三个参数:

  1. test : a logical vector. testlogical向量。 Say that it has a length of N . 假设它的长度为N
  2. yes : a vector. yes :向量。 It can be of any length. 它可以是任何长度。 If the length is not N , the vector is recycled/shortened to be of length N 如果长度不是N ,则向量将被循环/缩短为长度N
  3. no : same as yes . no :与yes相同。

At the end of this preprocessing stage, you have 3 same length vectors. 在此预处理阶段的最后,您具有3个相同长度的向量。 ifelse then builds the return value selecting the second vector or the third vector depending on test . ifelse然后根据test选择第二个向量或第三个向量来构建返回值。

In your case we have: 就您而言,我们有:

test <- foo1$date>2008 #length: 14
yes <- seq(1,11,1) #length: 11
no <- 99 #length: 1

So, it needs to recycle both yes and no . 因此,它需要回收yesno You end up with something like: 您最终会得到类似:

 test yes no
FALSE   1 99
FALSE   2 99
FALSE   3 99
FALSE   4 99
 TRUE   5 99
 TRUE   6 99
 TRUE   7 99
 TRUE   8 99
 TRUE   9 99
 TRUE  10 99
 TRUE  11 99
 TRUE   1 99
 TRUE   2 99
 TRUE   3 99

You see how the recycle works. 您将看到回收的工作方式。 Then, to build the return value, ifelse selects, in the order above, yes elements if test is TRUE and no elements otherwise. 然后,要构建返回值, ifelse按上述顺序选择testTRUE yes元素, ifelse选择no This explain why you have that return value. 这说明了为什么拥有该返回值。 It's not about dplyr of course. 当然,这与dplyr

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM