简体   繁体   English

将 Data.Tables (R) 与循环或映射组合

[英]Combining Data.Tables (R) with a loop or mapply

I am new to data tables in R and have managed to get 80% of the way through my analysis.我是 R 数据表的新手,并且通过我的分析已经成功完成了 80% 的工作。 The background is that I want to get the returns of a stock 5 days (before and after), and then 25 and 45 days after they report.背景是我想得到一只股票5天(前后)的回报,然后是他们报告后的25天和45天。 I have successfully managed to do it for one set of dates (effectively hardcoding) but when I try and automate the process it falls apart.我已经成功地为一组日期(有效地硬编码)做到了,但是当我尝试自动化这个过程时,它就崩溃了。

I will start with my current formulas and then explain the data.我将从我当前的公式开始,然后解释数据。

This formula successfully looks at the data tables and returns the sum that I need.这个公式成功地查看了数据表并返回了我需要的总和。 The issue is that datem5 and V1 need to go through a loop (or mapply) to automate the process.问题是 datem5 和 V1 需要 go 通过循环(或映射)来自动化该过程。

CQR_Date[CQR_DF[CQR_Date, sum(CQR), on = .(unit, date >= date1, date <= datem5),
            by = .EACHI], newvar := V1, on = .(unit, date1=date)]

I tried this (along with many other variants).我试过这个(以及许多其他变体)。 Please note the newvar needs to be addressed as well.请注意 newvar 也需要解决。

for (i in 1:4) {
              CQR_Date[CQR_DF[CQR_Date, sum(CQ), on = .(unit, date >= date1, date <= cols[,..i]),
              by = .EACHI], newvar := v, on = .(unit, date1=date)]

but get this error但得到这个错误

Error: argument specifying columns specify non existing column(s): cols[3]='cols[, ..i]'

Interestingly, when I try有趣的是,当我尝试

for (i in 1:2) {
 y <- cols[,..i]}

There is no issue.没有问题。

Now in terms of data;现在就数据而言;

  • col just contains the column headings that I need from CQR_Data col 只包含我需要来自 CQR_Data 的列标题

    cols <- data.table("datem5", "datep5", "datep20", "datep45") cols <- data.table("datem5", "datep5", "datep20", "datep45")

CQ_Data has the reporting dates for the stock CQ such as the following CQ_Data 具有股票 CQ 的报告日期,如下所示

 CQ_Date <- data.frame("date1" = anydate(c("2016-02-17", "2016-06-12", "2016-08-17")))
 CQ_Date$datem5 <- CQ_Date$date1 - 5  # minus five days
 CQ_Date$datep5 <- CQ_Date$date1 + 5  # plus five days
 CQ_Date$datep20 <- CQ_Date$date1 + 20
 CQ_Date$datep45 <- CQ_Date$date1 + 45
 CQ_Date$unit <- 1    # I guess I need this for some sort of indexing

Then CQ_DF (it is the log returns for the stock) is formed by:然后 CQ_DF(它是股票的对数收益)由以下组成:

 CQ_DF <- data.frame("unit" = rep(1,300))
 CQ_DF$CQ <- rnorm(10)
 CQ_DF$date <- seq(as.Date("2015-12-25"), by = "day", length.out = 300)
 CQ_DF$unit <- 1

Before setting them as DT在将它们设置为 DT 之前

setDT(CQ_DF)
setDT(CQ_Date)

Any help would be greatly appreciated.任何帮助将不胜感激。 Note this uses注意这个用途

  library(data.table)
  library(anytime)     

A simplified version is:一个简化的版本是:

  CQ_Date <- data.frame("date1" = c(10, 20))
  CQ_Date$datep5 <- CQ_Date$date1 + 5  # plus five days
  CQ_Date$datep20 <- CQ_Date$date1 + 10
  CQ_Date$unit <- 1 

  CQ_DF <- data.frame("unit" = rep(1,100))
  CQ_DF$CQ <- seq(1, by = 1, length.out = 100)
  CQ_DF$date <- seq(1, by = 1, length.out = 100)
  CQ_DF$unit <- 1

  setDT(CQ_DF)
  setDT(CQ_Date)

  cols <- c("datep5", "datep20" )

  tmp <- melt(CQ_Date, measure.vars = cols)
  setDT(tmp)

  tmp[CQ_DF[tmp, sum(CQ), on = .( unit, date >= date1,  date <= value), by = 
  .EACHI],newvar := V1, on = .(unit, date1=date  )]

The issue is now that the sum does not appear to work correctly.现在的问题是总和似乎无法正常工作。 It may have something to do with "variable" variable.它可能与“变量”变量有关。

Instead of using mapply or for loop, try reshaping the dataset in long format using melt , create sequence between the numbers, perform the join and calculate the sum .不要使用mapplyfor循环,而是尝试使用melt以长格式重塑数据集,在数字之间创建序列,执行连接并计算sum

library(data.table)
cols <- c("datep5", "datep20" )

tmp <- melt(CQ_Date, measure.vars = cols)
tmp <- melt(CQ_Date, measure.vars = cols)
tmp <- tmp[, list(date = seq(date1, value)), .(unit, variable, date1, value)]
tmp <- merge(tmp, CQ_DF, by = c('unit', 'date'))
tmp[, .(newvar = sum(CQ)), .(unit, variable, date1)]

#   unit variable date1 newvar
#1:    1   datep5    10     75
#2:    1  datep20    10    165
#3:    1   datep5    20    135
#4:    1  datep20    20    275

If you need the data back in wide format you can use dcast .如果您需要以宽格式返回数据,您可以使用dcast


Equivalent tidyverse option is:等效tidyverse选项是:

library(tidyverse)

CQ_Date %>%
  pivot_longer(cols = cols) %>%
  mutate(date = map2(date1, value, seq)) %>%
  unnest(date) %>%
  left_join(CQ_DF, by = c('unit', 'date')) %>%
  group_by(unit, name, date1) %>%
  summarise(newvar = sum(CQ))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM