简体   繁体   English

如何标准化股票价格数据

[英]How to normalize stock price data

Note: Dates are formatted as DD.MM.注意:日期格式为 DD.MM。

I have the closing prices for a number of companies (here: A, B, C) for a time frame (here: Jan 1st to Jan 5th).我有一些公司(此处:A、B、C)在一段时间内(此处:1 月 1 日至 1 月 5 日)的收盘价。 The df looks like this: df 看起来像这样:

df1 <- data.frame(date = c("01.01.", "02.01.", "03.01.", "04.01.", "05.01."),
                  A = c(102, 103, 107, 120, 134),
                  B = c(94, 95, 100, 93, 90),
                  C = c(55, 53, 50, 51, 48))

The way I want to normalize the data is by using the z-score, so "z = (x – μ) / σ", meaning that for A on 01.01., this would be (102 - 113) / 13.85641 = -0.7938...我想规范化数据的方法是使用 z 分数,因此“z = (x – μ) / σ”,这意味着对于 01.01 上的 A,这将是 (102 - 113) / 13.85641 = -0.7938 ...

How do I apply this to all my observations?我如何将其应用于我的所有观察? I'm guessing with the mutate funcation in dplyr but I can't seem to figure out how to actually do it.我猜测dplyr中的mutate函数,但我似乎无法弄清楚如何实际做到这一点。

In dplyr , I think you'll need to use something like across(c(A,B,C), ...) .dplyr中,我认为您需要使用 cross across(c(A,B,C), ...)类的东西。

Just to offer an alternative method using data.table , which will update the table by reference ie.只是为了提供一种使用data.table的替代方法,它将通过引用更新表格,即。 there is no need to write something like df1 <- df1 %>%... in this situation.在这种情况下,没有必要写df1 <- df1 %>%...之类的东西。

library(data.table)
setDT(df1)


cols <- c("A","B","C")

df1[, (cols) := lapply(.SD, function(x) (x - mean(x))/sd(x)), .SDcols = cols]
df1
     date          A          B          C
1: 01.01. -0.8196829 -0.1096817  1.3324198
2: 02.01. -0.7464969  0.1645225  0.5921866
3: 03.01. -0.4537530  1.5355438 -0.5181632
4: 04.01.  0.4976646 -0.3838859 -0.1480466
5: 05.01.  1.5222682 -1.2064987 -1.2583965

For more information, see Introduction to data.table .有关详细信息,请参阅data.table简介

In addition to the @diomedesdata solution, your question asked for the dplyr solution.除了@diomedesdata 解决方案之外,您还询问了dplyr解决方案。 I believe here is an approach that would work for your data:我相信这是一种适用于您的数据的方法:

if(require(dplyr)==F) install.packages('dplyr'); library(dplyr)

df1 <- data.frame(date = c("01.01.", "02.01.", "03.01.", "04.01.", "05.01."),
                  A = c(102, 103, 107, 120, 134),
                  B = c(94, 95, 100, 93, 90),
                  C = c(55, 53, 50, 51, 48))

df1 = df1 %>% 
  mutate(across(.cols = A:C,
                .f = function(x){(x-mean(x))/sd(x)}
                ))

This would return the following:这将返回以下内容: 在此处输入图像描述

Actually, no package is required at all;实际上,根本不需要 package; write a function and lapply it over the respective columns.写一个lapply并将其覆盖在相应的列上。

z <- \(x) (x - mean(x)) / sd(x)
transform(df1, z=lapply(df1[-1], z))
#     date   A   B  C        z.A        z.B        z.C
# 1 01.01. 102  94 55 -0.8196829 -0.1096817  1.3324198
# 2 02.01. 103  95 53 -0.7464969  0.1645225  0.5921866
# 3 03.01. 107 100 50 -0.4537530  1.5355438 -0.5181632
# 4 04.01. 120  93 51  0.4976646 -0.3838859 -0.1480466
# 5 05.01. 134  90 48  1.5222682 -1.2064987 -1.2583965

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM