如何标准化股票价格数据

Question

Note: Dates are formatted as DD.MM.注意：日期格式为 DD.MM。

I have the closing prices for a number of companies (here: A, B, C) for a time frame (here: Jan 1st to Jan 5th).我有一些公司（此处：A、B、C）在一段时间内（此处：1 月 1 日至 1 月 5 日）的收盘价。 The df looks like this: df 看起来像这样：

df1 <- data.frame(date = c("01.01.", "02.01.", "03.01.", "04.01.", "05.01."),
                  A = c(102, 103, 107, 120, 134),
                  B = c(94, 95, 100, 93, 90),
                  C = c(55, 53, 50, 51, 48))

The way I want to normalize the data is by using the z-score, so "z = (x – μ) / σ", meaning that for A on 01.01., this would be (102 - 113) / 13.85641 = -0.7938...我想规范化数据的方法是使用 z 分数，因此“z = (x – μ) / σ”，这意味着对于 01.01 上的 A，这将是 (102 - 113) / 13.85641 = -0.7938 ...

How do I apply this to all my observations?我如何将其应用于我的所有观察？ I'm guessing with the mutate funcation in dplyr but I can't seem to figure out how to actually do it.我猜测dplyr中的mutate函数，但我似乎无法弄清楚如何实际做到这一点。

Answer 1

In dplyr , I think you'll need to use something like across(c(A,B,C), ...) .在dplyr中，我认为您需要使用 cross across(c(A,B,C), ...)类的东西。

Just to offer an alternative method using data.table , which will update the table by reference ie.只是为了提供一种使用data.table的替代方法，它将通过引用更新表格，即。 there is no need to write something like df1 <- df1 %>%... in this situation.在这种情况下，没有必要写df1 <- df1 %>%...之类的东西。

library(data.table)
setDT(df1)


cols <- c("A","B","C")

df1[, (cols) := lapply(.SD, function(x) (x - mean(x))/sd(x)), .SDcols = cols]
df1

     date          A          B          C
1: 01.01. -0.8196829 -0.1096817  1.3324198
2: 02.01. -0.7464969  0.1645225  0.5921866
3: 03.01. -0.4537530  1.5355438 -0.5181632
4: 04.01.  0.4976646 -0.3838859 -0.1480466
5: 05.01.  1.5222682 -1.2064987 -1.2583965

For more information, see Introduction to data.table .有关详细信息，请参阅data.table简介。

Answer 2

In addition to the @diomedesdata solution, your question asked for the dplyr solution.除了@diomedesdata 解决方案之外，您还询问了dplyr解决方案。 I believe here is an approach that would work for your data:我相信这是一种适用于您的数据的方法：

if(require(dplyr)==F) install.packages('dplyr'); library(dplyr)

df1 <- data.frame(date = c("01.01.", "02.01.", "03.01.", "04.01.", "05.01."),
                  A = c(102, 103, 107, 120, 134),
                  B = c(94, 95, 100, 93, 90),
                  C = c(55, 53, 50, 51, 48))

df1 = df1 %>% 
  mutate(across(.cols = A:C,
                .f = function(x){(x-mean(x))/sd(x)}
                ))

This would return the following:这将返回以下内容：

Answer 3

Actually, no package is required at all;实际上，根本不需要 package； write a function and lapply it over the respective columns.写一个lapply并将其覆盖在相应的列上。

z <- \(x) (x - mean(x)) / sd(x)
transform(df1, z=lapply(df1[-1], z))
#     date   A   B  C        z.A        z.B        z.C
# 1 01.01. 102  94 55 -0.8196829 -0.1096817  1.3324198
# 2 02.01. 103  95 53 -0.7464969  0.1645225  0.5921866
# 3 03.01. 107 100 50 -0.4537530  1.5355438 -0.5181632
# 4 04.01. 120  93 51  0.4976646 -0.3838859 -0.1480466
# 5 05.01. 134  90 48  1.5222682 -1.2064987 -1.2583965

如何标准化股票价格数据

问题描述

3 个解决方案

解决方案1
1 2022-09-16 14:07:58

解决方案2
1 2022-09-16 14:30:43

解决方案3
0 2022-09-16 15:10:43

如何标准化股票价格数据

问题描述

3 个解决方案

解决方案1 1 2022-09-16 14:07:58

解决方案2 1 2022-09-16 14:30:43

解决方案3 0 2022-09-16 15:10:43

解决方案1
1 2022-09-16 14:07:58

解决方案2
1 2022-09-16 14:30:43

解决方案3
0 2022-09-16 15:10:43