[英]How to normalize stock price data
Note: Dates are formatted as DD.MM.注意:日期格式为 DD.MM。
I have the closing prices for a number of companies (here: A, B, C) for a time frame (here: Jan 1st to Jan 5th).我有一些公司(此处:A、B、C)在一段时间内(此处:1 月 1 日至 1 月 5 日)的收盘价。 The df looks like this:
df 看起来像这样:
df1 <- data.frame(date = c("01.01.", "02.01.", "03.01.", "04.01.", "05.01."),
A = c(102, 103, 107, 120, 134),
B = c(94, 95, 100, 93, 90),
C = c(55, 53, 50, 51, 48))
The way I want to normalize the data is by using the z-score, so "z = (x – μ) / σ", meaning that for A on 01.01., this would be (102 - 113) / 13.85641 = -0.7938...我想规范化数据的方法是使用 z 分数,因此“z = (x – μ) / σ”,这意味着对于 01.01 上的 A,这将是 (102 - 113) / 13.85641 = -0.7938 ...
How do I apply this to all my observations?我如何将其应用于我的所有观察? I'm guessing with the
mutate
funcation in dplyr
but I can't seem to figure out how to actually do it.我猜测
dplyr
中的mutate
函数,但我似乎无法弄清楚如何实际做到这一点。
In dplyr
, I think you'll need to use something like across(c(A,B,C), ...)
.在
dplyr
中,我认为您需要使用 cross across(c(A,B,C), ...)
类的东西。
Just to offer an alternative method using data.table
, which will update the table by reference ie.只是为了提供一种使用
data.table
的替代方法,它将通过引用更新表格,即。 there is no need to write something like df1 <- df1 %>%...
in this situation.在这种情况下,没有必要写
df1 <- df1 %>%...
之类的东西。
library(data.table)
setDT(df1)
cols <- c("A","B","C")
df1[, (cols) := lapply(.SD, function(x) (x - mean(x))/sd(x)), .SDcols = cols]
df1
date A B C
1: 01.01. -0.8196829 -0.1096817 1.3324198
2: 02.01. -0.7464969 0.1645225 0.5921866
3: 03.01. -0.4537530 1.5355438 -0.5181632
4: 04.01. 0.4976646 -0.3838859 -0.1480466
5: 05.01. 1.5222682 -1.2064987 -1.2583965
For more information, see Introduction to data.table
.有关详细信息,请参阅
data.table
简介。
In addition to the @diomedesdata solution, your question asked for the dplyr
solution.除了@diomedesdata 解决方案之外,您还询问了
dplyr
解决方案。 I believe here is an approach that would work for your data:我相信这是一种适用于您的数据的方法:
if(require(dplyr)==F) install.packages('dplyr'); library(dplyr)
df1 <- data.frame(date = c("01.01.", "02.01.", "03.01.", "04.01.", "05.01."),
A = c(102, 103, 107, 120, 134),
B = c(94, 95, 100, 93, 90),
C = c(55, 53, 50, 51, 48))
df1 = df1 %>%
mutate(across(.cols = A:C,
.f = function(x){(x-mean(x))/sd(x)}
))
Actually, no package is required at all;实际上,根本不需要 package; write a function and
lapply
it over the respective columns.写一个
lapply
并将其覆盖在相应的列上。
z <- \(x) (x - mean(x)) / sd(x)
transform(df1, z=lapply(df1[-1], z))
# date A B C z.A z.B z.C
# 1 01.01. 102 94 55 -0.8196829 -0.1096817 1.3324198
# 2 02.01. 103 95 53 -0.7464969 0.1645225 0.5921866
# 3 03.01. 107 100 50 -0.4537530 1.5355438 -0.5181632
# 4 04.01. 120 93 51 0.4976646 -0.3838859 -0.1480466
# 5 05.01. 134 90 48 1.5222682 -1.2064987 -1.2583965
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.