[英](base R) How to apply `diff()` to a column and append it as a new data.frame column?
I have a data frame df1
like this:我有一个像这样的数据框
df1
:
time![]() |
Diamond.Hands![]() |
returns![]() |
volume![]() |
close![]() |
---|---|---|---|---|
2021-02-16 10:00:00 ![]() |
0.4583333 ![]() |
0.0056710775 ![]() |
10059 ![]() |
53.20 ![]() |
2021-02-16 11:00:00 ![]() |
0.2352941 ![]() |
-0.0037586920 ![]() |
8664 ![]() |
53.01 ![]() |
2021-02-16 12:00:00 ![]() |
0.4400000 ![]() |
-0.0037586920 ![]() |
10059 ![]() |
52.40 ![]() |
# Log return
prices <- df1$close
log_returns <- diff(log(prices), lag=1)
df1$logreturns <- log_returns
returns the error:返回错误:
Fehler in `$<-.data.frame`(`*tmp*`, logreturns, value = c(0.000187952260679136, :
Ersetzung hat 2219 Zeilen, Daten haben 2220
Do you have any ideas how to fix that?你有什么想法可以解决这个问题吗?
When you do当你这样做
y <- diff(x, lag = m, differences = k)
the resulting vector y
has m * k
fewer elements than x
.结果向量
y
的元素比x
少m * k
。 If you want to have both x
and y
as data.frame/matrix columns, you need to pad m * k
number of leading NAs to y
.如果要将
x
和y
都作为 data.frame/matrix 列,则需要将m * k
前导 NA 数填充到y
。
In your case, m = 1
and k = 1
, so you need to pad one NA:在您的情况下,
m = 1
和k = 1
,因此您需要填充一个 NA:
df1$logreturns <- c(NA, log_returns)
More concisely, we can pack your 3 lines of code into 1:更简洁地说,我们可以将你的 3 行代码打包成 1 行:
df1$logreturns <- c(NA, diff(log(df1$close)))
Remark:评论:
If you want to know how to do mutate()
+ diff()
in dplyr , then maybe something like:如果您想知道如何在dplyr中执行
mutate()
+ diff()
,那么可能类似于:
df1 %>% mutate(logreturns = c(NA, diff(log(close))))
Here is another possibly related Q & A: Error when using "diff" function inside of dplyr mutate .这是另一个可能相关的问答:在 dplyr mutate 中使用“diff” function 时出错。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.