[英]confused about creating new columns efficiently in R using mutate and rollmean function applied on only numeric columns
我有以下(大型樣本)數據集。 請注意,任何系列的開頭都可能有 NA,但我沒有在下面的示例數據中顯示。
df <- data.frame(Date = rev(seq(as.Date("2021-01-01"),as.Date("2021-01-20"),"day")),
Var1 = sample(10:100, 20, replace = TRUE),
Var2 = sample(10:100, 20, replace = TRUE),
Var3 = sample(10:100, 20, replace = TRUE),
Var4 = sample(10:100, 20, replace = TRUE),
Var5 = letters[1:20],
Var6 = letters[7:26]
)
我想計算每個系列的滾動方式,即 Var1:3 天滾動方式、4 天滾動方式和 7 天滾動方式。
我可以使用以下方法來做到這一點:
df <- tibble(df) %>%
mutate(Var1_3DMA = rollmean(Var1, k = 3, fill = NA)) %>%
mutate(Var1_4DMA = rollmean(Var1, k = 4, fill = NA)) %>%
mutate(Var1_7DMA = rollmean(Var1, k = 7, fill = NA))
我只想對所有數字變量執行此操作(即保留所有日期和字符變量,然后計算滾動平均值)。
我還想使用original name
后跟下划線來創建(變異)新變量_
即 Var1_3DMA、Var1_4DMA、Var1_7DMA、Var2_3DMA、Var2_4DMA、Var2_7DMA 等等。
在 R 中有沒有一種有效的方法?
library(dplyr); library(zoo)
df %>%
mutate(across(Var1:Var4,
list(`3DMA` = ~ rollmean(.x, k=3, fill = NA),
`4DMA` = ~ rollmean(.x, k=4, fill = NA),
`7DMA` = ~ rollmean(.x, k=7, fill = NA))))
Date Var1 Var2 Var3 Var4 Var5 Var6 Var1_3DMA Var1_4DMA Var1_7DMA Var2_3DMA Var2_4DMA Var2_7DMA Var3_3DMA Var3_4DMA Var3_7DMA Var4_3DMA Var4_4DMA Var4_7DMA
1 2021-01-20 77 76 53 57 a g NA NA NA NA NA NA NA NA NA NA NA NA
2 2021-01-19 94 76 99 33 b h 80.00000 72.50 NA 67.33333 73.00 NA 79.00000 70.25 NA 35.00000 36.00 NA
3 2021-01-18 69 50 85 15 c i 71.00000 62.50 NA 72.00000 60.75 NA 76.00000 80.00 NA 29.00000 29.75 NA
4 2021-01-17 50 90 44 39 d j 52.00000 45.25 63.71429 55.66667 64.00 71.85714 73.66667 69.75 74.42857 28.66667 39.75 40.85714
5 2021-01-16 37 27 92 32 e k 37.33333 51.50 64.28571 68.66667 75.25 65.71429 64.66667 71.00 70.14286 48.00000 45.25 40.14286
6 2021-01-15 25 89 58 73 f l 52.00000 59.25 65.14286 70.33333 61.00 68.57143 80.00000 65.75 67.71429 47.33333 48.50 43.57143
7 2021-01-14 94 95 90 37 g m 66.66667 75.00 69.28571 72.33333 78.25 67.14286 57.00000 63.25 66.00000 54.00000 54.75 44.57143
8 2021-01-13 81 33 23 52 h n 91.66667 93.25 70.85714 74.66667 66.00 58.85714 65.00000 67.00 73.42857 48.66667 42.00 52.28571
9 2021-01-12 100 96 82 57 i o 93.00000 85.00 79.28571 56.33333 50.25 57.42857 59.33333 68.50 62.14286 43.66667 56.00 53.57143
10 2021-01-11 98 40 73 22 j p 86.33333 88.75 80.00000 56.00000 46.25 53.71429 83.66667 66.00 57.14286 57.33333 53.25 51.71429
11 2021-01-10 61 32 96 93 k q 85.00000 71.25 73.14286 29.66667 38.00 51.85714 60.66667 51.25 47.00000 52.00000 54.00 57.71429
12 2021-01-09 96 17 13 41 l r 62.33333 58.25 70.57143 37.33333 48.50 52.71429 44.00000 37.75 55.85714 64.66667 68.25 56.42857
13 2021-01-08 30 63 23 60 m s 57.33333 58.75 59.28571 54.00000 50.25 43.00000 18.33333 35.00 46.00000 60.00000 55.75 57.71429
14 2021-01-07 46 82 19 79 n t 46.33333 40.00 55.00000 61.33333 53.00 42.57143 42.33333 35.00 43.28571 60.66667 62.00 60.42857
15 2021-01-06 63 39 85 43 o u 43.33333 49.50 60.42857 49.66667 46.50 50.28571 39.00000 42.75 36.00000 62.66667 57.25 50.57143
16 2021-01-05 21 28 13 66 p v 50.66667 62.75 49.28571 34.66667 47.50 55.28571 50.66667 49.25 40.57143 50.00000 43.50 58.71429
17 2021-01-04 68 37 54 41 q w 62.66667 51.50 48.00000 50.33333 50.75 58.28571 37.33333 39.25 44.28571 43.66667 57.25 59.85714
18 2021-01-03 99 86 45 24 r x 61.66667 51.50 NA 58.33333 64.75 NA 48.00000 48.25 NA 54.33333 57.75 NA
19 2021-01-02 18 52 45 98 s y 46.00000 NA NA 74.00000 NA NA 46.33333 NA NA 63.33333 NA NA
20 2021-01-01 21 84 49 68 t z NA NA NA NA NA NA NA NA NA NA NA NA
您可以使用map
在rollmean
中傳遞多個k
值。
library(dplyr)
library(purrr)
library(zoo)
k <- c(3, 4, 7)
map(k, function(n) df %>% mutate(across(where(is.numeric),
~rollmean(., n, fill = NA), .names = '{sprintf("%s_%dDMA", .col, n)}'))) %>%
reduce(inner_join, by = names(df))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.