簡體   English   中英

如何計算條件已滿足的行數

[英]How to count the number of rows a condition has been met

我有一個股票流動性數據dput df (下面的dput語句),我需要測量值在前連續三行中有多少行> 10:

  date     sec_id  liquidity   good       count.good.rows
2016-07-29   3277  9.142245 FALSE               0
2016-08-31   3277 11.070555  TRUE               0
2016-09-30   3277 11.934113  TRUE               1
2016-10-31   3277 12.192237  TRUE               2
2016-11-30   3277 10.165183  TRUE               3
2016-12-30   3277  8.414033 FALSE               3
2016-01-29   3426  6.494181 FALSE               0
2016-02-29   3426  8.216213 FALSE               0
2016-03-31   3426 10.081115  TRUE               0
2016-04-29   3426 10.119685  TRUE               1
2016-05-31   3426  8.659732 FALSE               2
2016-06-30   3426  6.790178 FALSE               1
2016-07-29   3426  7.234159 FALSE               0

請注意有關數據的幾件事:

  1. 有多個sec_id值,我需要根據data列的順序對每個sec_id值進行此工作。
  2. 我已經添加了good列,但是在不顯式使用lag(...,1) + lag(...,2) + lag(...,3)情況下,無法弄清楚如何執行count.good.rowslag(...,1) + lag(...,2) + lag(...,3) 這將是一個糟糕的解決方案,因為我需要將3作為變量(我可能最終想要查看前面的2行或4行)。

有任何想法嗎?

這是我的完整dput

df = structure(list(date = structure(c(16829, 16860, 16891, 16920, 16952, 16982, 17011, 17044, 17074, 17105, 17135, 17165, 16829, 16860, 16891, 16920, 16952, 16982, 17011, 17044, 17074, 17105, 17135, 17165), class = "Date"),
    sec_id = c(3277L, 3277L, 3277L, 3277L, 3277L, 3277L, 3277L, 3277L, 3277L, 3277L, 3277L, 3277L, 3426L, 3426L, 3426L, 3426L, 3426L, 3426L, 3426L, 3426L, 3426L, 3426L, 3426L, 3426L),
    liquidity = c(4.014428, 3.779665, 4.833813, 5.244417, 7.150838, 7.639399, 9.142245, 11.070555, 11.934113, 12.192237, 10.165183, 8.414033, 6.494181, 8.216213, 10.081115, 10.119685, 8.659732, 6.790178, 7.234159, 8.529101, 9.015898, 8.307979, 8.231237, 8.711095),
    good = c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE)),
    class = "data.frame", .Names = c("date", "sec_id", "liquidity", "good"), 
    row.names = c(NA, -24L))

您可以定義一個lagcounter函數,該函數將n行的累計和減去good的累計和:

lagcounter = function(x,n) {y = cumsum(x); lag(y-lag(y,n,default=0),default=0)}

然后使用dplyr語法,在按sec_id分組后,在mutate語句中使用這個新定義的函數:

library(dplyr)
df %>% group_by(sec_id) %>% mutate(count.good.rows = lagcounter(good,3)) 

         date sec_id liquidity  good count.good.rows
1  2016-01-29   3277  4.014428 FALSE               0
2  2016-02-29   3277  3.779665 FALSE               0
3  2016-03-31   3277  4.833813 FALSE               0
4  2016-04-29   3277  5.244417 FALSE               0
5  2016-05-31   3277  7.150838 FALSE               0
6  2016-06-30   3277  7.639399 FALSE               0
7  2016-07-29   3277  9.142245 FALSE               0
8  2016-08-31   3277 11.070555  TRUE               0
9  2016-09-30   3277 11.934113  TRUE               1
10 2016-10-31   3277 12.192237  TRUE               2
11 2016-11-30   3277 10.165183  TRUE               3
12 2016-12-30   3277  8.414033 FALSE               3
13 2016-01-29   3426  6.494181 FALSE               0
14 2016-02-29   3426  8.216213 FALSE               0
15 2016-03-31   3426 10.081115  TRUE               0
16 2016-04-29   3426 10.119685  TRUE               1
17 2016-05-31   3426  8.659732 FALSE               2
18 2016-06-30   3426  6.790178 FALSE               2
19 2016-07-29   3426  7.234159 FALSE               1
20 2016-08-31   3426  8.529101 FALSE               0
21 2016-09-30   3426  9.015898 FALSE               0
22 2016-10-31   3426  8.307979 FALSE               0
23 2016-11-30   3426  8.231237 FALSE               0
24 2016-12-30   3426  8.711095 FALSE               0

嘗試zoo::rollapply

library(zoo)
df %>%
  group_by(sec_id) %>%
  mutate(count_good_rows = rollapply(good, 3, sum, align="right", partial=TRUE))

# A tibble: 13 x 5
# Groups: sec_id [2]
   # date       sec_id liquidity good  count_good_rows
   # <fctr>      <int>     <dbl> <lgl>           <int>
 # 1 2016-07-29   3277      9.14 F                   0
 # 2 2016-08-31   3277     11.1  T                   1
 # 3 2016-09-30   3277     11.9  T                   2
 # 4 2016-10-31   3277     12.2  T                   3
 # 5 2016-11-30   3277     10.2  T                   3
 # 6 2016-12-30   3277      8.41 F                   2
 # 7 2016-01-29   3426      6.49 F                   0
 # 8 2016-02-29   3426      8.22 F                   0
 # 9 2016-03-31   3426     10.1  T                   1
# 10 2016-04-29   3426     10.1  T                   2
# 11 2016-05-31   3426      8.66 F                   2
# 12 2016-06-30   3426      6.79 F                   1
# 13 2016-07-29   3426      7.23 F                   0

編輯如果你只在前面三行計數感興趣

df %>%
  group_by(sec_id) %>%
  mutate(count_good_rows = rollapply(dplyr::lag(good, 1), 3, function(i) sum(i, na.rm=TRUE), align="right", partial=TRUE))

# A tibble: 13 x 5
# Groups: sec_id [2]
   # date       sec_id liquidity good  count_good_rows
   # <fctr>      <int>     <dbl> <lgl>           <int>
 # 1 2016-07-29   3277      9.14 F                   0
 # 2 2016-08-31   3277     11.1  T                   0
 # 3 2016-09-30   3277     11.9  T                   1
 # 4 2016-10-31   3277     12.2  T                   2
 # 5 2016-11-30   3277     10.2  T                   3
 # 6 2016-12-30   3277      8.41 F                   3
 # 7 2016-01-29   3426      6.49 F                   0
 # 8 2016-02-29   3426      8.22 F                   0
 # 9 2016-03-31   3426     10.1  T                   0
# 10 2016-04-29   3426     10.1  T                   1
# 11 2016-05-31   3426      8.66 F                   2
# 12 2016-06-30   3426      6.79 F                   2
# 13 2016-07-29   3426      7.23 F                   1

數據

df <- read.table(text="date     sec_id  liquidity   good     
2016-07-29   3277  9.142245 FALSE 
2016-08-31   3277 11.070555  TRUE  
2016-09-30   3277 11.934113  TRUE  
2016-10-31   3277 12.192237  TRUE  
2016-11-30   3277 10.165183  TRUE  
2016-12-30   3277  8.414033 FALSE 
2016-01-29   3426  6.494181 FALSE 
2016-02-29   3426  8.216213 FALSE 
2016-03-31   3426 10.081115  TRUE  
2016-04-29   3426 10.119685  TRUE  
2016-05-31   3426  8.659732 FALSE 
2016-06-30   3426  6.790178 FALSE  
2016-07-29   3426  7.234159 FALSE  ", header=TRUE)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM