简体   繁体   English

r-使用数据帧列中的下一个非na值进行计算

[英]r - calculate using next non-na value in data frame column

I have some data in a dataframe, and I would like to calculate the percentage change between the month value. 我在数据框中有一些数据,我想计算month值之间的百分比变化。 The problem is I have NA in some entries and it throws of the calculation. 问题是我在某些条目中不NA ,并且引发了计算。

       irm     code        price    pct.change
1  201807 511130F075A04      4.6600   2.192982
2  201806 511130F075A04      4.5600   1.333333
3  201805 511130F075A04      4.5000 -13.461538
4  201804 511130F075A04      5.2000         NA
5  201803 511130F075A04          NA         NA
6  201802 511130F075A04      4.9100   1.867220
7  201801 511130F075A04      4.8200  -5.304519
8  201712 511130F075A04      5.0900   2.414487
9  201711 511130F075A04      4.9700  -3.307393
10 201710 511130F075A04      5.1400         NA
11 201709 511130F075A04          NA         NA
12 201708 511130F075A04      5.2900   2.918288
13 201707 511130F075A04      5.1400  66.553255
14 201706 511130F075A04      3.0861 -10.664351
15 201705 511130F075A04      3.4545  -7.241824

The problem is in row 4 and row 10 in the pct.change column. 问题出在pct.change列的第4行和第10行。 They are NA but I would like them to be calculated using the latest value of price that is not NA . 它们是NA但我希望使用不是NA的最新price值来计算它们。 The desired output would be (see rows 4 and 10): 所需的输出将是(请参阅第4和10行):

       irm     code        price    pct.change
1  201807 511130F075A04      4.6600   2.192982
2  201806 511130F075A04      4.5600   1.333333
3  201805 511130F075A04      4.5000 -13.461538
**4  201804 511130F075A04      5.2000   5.906314**
5  201803 511130F075A04          NA         NA
6  201802 511130F075A04      4.9100   1.867220
7  201801 511130F075A04      4.8200  -5.304519
8  201712 511130F075A04      5.0900   2.414487
9  201711 511130F075A04      4.9700  -3.307393
**10 201710 511130F075A04      5.1400  -2.835539**
11 201709 511130F075A04          NA         NA
12 201708 511130F075A04      5.2900   2.918288
13 201707 511130F075A04      5.1400  66.553255
14 201706 511130F075A04      3.0861 -10.664351
15 201705 511130F075A04      3.4545  -7.241824

I had tried the standard (x/lead(x) - 1)*100 and several variations using (x/lag(which(!is.na(lead(x)) but I seem to be missing something. Is there a straightforward way to do it in base or even dplyr ? I don't want to replace the NAs, I want to keep them. 我已经尝试过标准(x/lead(x) - 1)*100和使用(x/lag(which(!is.na(lead(x))几种变体,但我似乎缺少了一些东西。 base或什dplyrdplyr我不想替换NA,我想保留它们。

@LAP's comment is probably the best way to do it. @LAP的评论可能是最好的方法。 The syntax is a little better with data.table 使用data.table的语法要好data.table

library(data.table)
setDT(df)

df[!is.na(price), pct.change := 100*(price/shift(price, type = 'lead') - 1)]

#        irm          code  price pct.change
#  1: 201807 511130F075A04 4.6600   2.192982
#  2: 201806 511130F075A04 4.5600   1.333333
#  3: 201805 511130F075A04 4.5000 -13.461538
#  4: 201804 511130F075A04 5.2000   5.906314
#  5: 201803 511130F075A04     NA         NA
#  6: 201802 511130F075A04 4.9100   1.867220
#  7: 201801 511130F075A04 4.8200  -5.304519
#  8: 201712 511130F075A04 5.0900   2.414487
#  9: 201711 511130F075A04 4.9700  -3.307393
# 10: 201710 511130F075A04 5.1400  -2.835539
# 11: 201709 511130F075A04     NA         NA
# 12: 201708 511130F075A04 5.2900   2.918288
# 13: 201707 511130F075A04 5.1400  66.553255
# 14: 201706 511130F075A04 3.0861 -10.664351
# 15: 201705 511130F075A04 3.4545         NA

in Base R you can decide to replace: 在Base R中,您可以决定替换:

 a = which(is.na(df$price))-1
 transform(df,pct.change=replace(pct.change,a,100*(price[a]/price[a+2]-1)))
      irm          code  price pct.change
1  201807 511130F075A04 4.6600   2.192982
2  201806 511130F075A04 4.5600   1.333333
3  201805 511130F075A04 4.5000 -13.461538
4  201804 511130F075A04 5.2000   5.906314
5  201803 511130F075A04     NA         NA
6  201802 511130F075A04 4.9100   1.867220
7  201801 511130F075A04 4.8200  -5.304519
8  201712 511130F075A04 5.0900   2.414487
9  201711 511130F075A04 4.9700  -3.307393
10 201710 511130F075A04 5.1400  -2.835539
11 201709 511130F075A04     NA         NA
12 201708 511130F075A04 5.2900   2.918288
13 201707 511130F075A04 5.1400  66.553255
14 201706 511130F075A04 3.0861 -10.664351
15 201705 511130F075A04 3.4545  -7.241824

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM