[英]How to Calculate Conditional Average (Volume Weighted) by TimeDate Group in R?
I am trying to calculate volume weighted average price (VWAP) from a trade data, based on same DateTime group. 我正在尝试基于相同的DateTime组从交易数据计算交易量加权平均价格(VWAP)。 The sample data from a small data frame (20,000 entries) is as following: 来自一个小数据框(20,000个条目)的样本数据如下:
testdata[,c(5,8,10,11)]
transactiontime price volume totalEquity
334 2014-12-01 01:30:00.110000 19330 500000 966500
335 2014-12-01 01:30:00.830000 19340 8000 15472
336 2014-12-01 01:30:00.830000 19340 1000 1934
337 2014-12-01 01:30:00.830000 19340 1000 1934
338 2014-12-01 01:30:01 19340 500 967
339 2014-12-01 01:30:01 19340 2000 3868
340 2014-12-01 01:30:01 19340 4000 7736
341 2014-12-01 01:30:01 19340 40000 77360
342 2014-12-01 01:30:01 19340 500000 967000
343 2014-12-01 01:30:01 19340 12000 23208
where totalEquity
was an added column by workdata$totalEquity <- (workdata$price)/10000 * (workdata$volume)
其中totalEquity
是workdata$totalEquity <- (workdata$price)/10000 * (workdata$volume)
添加的列
I want to calculate VWAP by transaction time. 我想按交易时间计算VWAP。 Using aggregate
is easy to calculate mean(totalEquity)
, but how can I calculate the volume weighted average so to achieve something like: 使用aggregate
很容易计算mean(totalEquity)
,但是如何计算体积加权平均值以实现类似以下目的:
group.1 transactiontime weightedPrice
1 2014-12-01 01:30:00.110000 1.933
2 2014-12-01 01:30:00.830000 1.934
3 2014-12-01 01:30:01 1.934
where weightedPrice
is sum(totalEquity)/sum(volume)
, grouped by transactiontime
. 其中weightedPrice
是sum(totalEquity)/sum(volume)
,按transactiontime
分组。
I've searched for many questions on group mean but did not know how to combine each function correctly; 我已经搜索了很多关于均值的问题,但不知道如何正确地组合每个函数。 none of my previous attempts worked, as of following, and I was very frustrated: 如下所述,我以前的尝试均无济于事,我感到非常沮丧:
volWeighted <- function(x=workdata$totalEquity,y=workdata$volume) {sum(x)/sum(y)}
aggregate(totalEquity~transactiontime, testdata, FUN=volWeighted)
or 要么
library(data.table)
dt[,list(avg_tte <- sum(testdata$totalEquity)/sum(testdata$volume)),
'testdata$transactiontime']
or 要么
setDT(testdata) [, time.diff :=max(time)-min(time), by=transactiontime]
[, if(time.diff==0)
.( totalEquity = sum(totalEquity)/sum(volume))
else .SD, by = .(transactiontime, time.diff)]
This is my first question and I tried to make it efficient, but if it happens to be a duplicate, please let me know and I am very willing to learn from the previous question. 这是我的第一个问题,我试图使其高效。但是,如果它恰好是重复的,请告诉我,我非常愿意从上一个问题中学习。
Close on all your attempts. 结束所有尝试。
Instead of aggregate
try by
: 而不是aggregate
尝试by
:
by(workdata, workdata$transactiontime, function (x) sum(x$totalEquity)/sum(x$volume))
This will return just the values in of the weightedPrice
column - up to you to add it to the dataframe. 这将仅返回weightedPrice
列中的weightedPrice
-由您决定将其添加到数据框。
I'd recommend something like plyr
or data.frame
我建议像plyr
或data.frame
东西
library(plyr)
ddply(workdata, .(transactiontime), summarize,
weightedPrice=sum(totalEquity)/sum(volume))
The summarize
function calculates some summary statistic, and ddply
calls summarize
for each unique transactiontime
. 该summarize
函数计算一些汇总统计,并ddply
来电summarize
为每一个独特transactiontime
。 The calculation works a bit like the transform
function, where if you write weightedPrice=sum(totalEquity)/sum(volume)
it calculates the expression looking up columns in workdata
(so no need for workdata$columnname
), and assigns it to a column called weightedPrice
. 该计算有点像transform
函数,在其中,如果您编写weightedPrice=sum(totalEquity)/sum(volume)
它将计算在工作数据中查找列的workdata
(因此不需要workdata$columnname
),并将其分配给列称为weightedPrice
。
If data.table
: 如果data.table
:
library(data.table)
setDT(workdata) # turn workdata into data.table
workdata[, list(weightedPrice=sum(totalEquity)/sum(volume)), by=transactiontime]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.