从大数据表中提取列到小数据表并保存在列表中

Question

I get a data table (time series for different products depending on dates) from an extern server, which can have the following maximal number of columns (date is always the first column, and all other columns can exists or not, or there are only two additional columns, or whatever):我从外部服务器获得一个数据表（不同产品的时间序列取决于日期），它可以有以下最大列数（日期总是第一列，所有其他列可以存在或不存在，或者只有两个额外的列，或其他）：

set.seed(123)
dt.data <- data.table(date = seq(as.Date('2020-01-01'), by = '1 day', length.out = 365),
                      'DEB Cal-2019' = rnorm(365, 2, 1), 'DEB Cal-2021' = rnorm(365, 2, 1),
                      'DEB Cal-2022' = rnorm(365, 2, 1), 'DEB Cal-2023' = rnorm(365, 2, 1),
                      'ATB Cal-2019' = rnorm(365, 2, 1), 'ATB Cal-2021' = rnorm(365, 2, 1),
                      'ATB Cal-2022' = rnorm(365, 2, 1), 'ATB Cal-2023' = rnorm(365, 2, 1),
                      'TTF Cal-2019' = rnorm(365, 2, 1), 'TTF Cal-2021' = rnorm(365, 2, 1),
                      'TTF Cal-2022' = rnorm(365, 2, 1), 'TTF Cal-2023' = rnorm(365, 2, 1),
                      'NCG Cal-2019' = rnorm(365, 2, 1), 'NCG Cal-2021' = rnorm(365, 2, 1),
                      'NCG Cal-2022' = rnorm(365, 2, 1), 'NCG Cal-2023' = rnorm(365, 2, 1),
                      'AUTVTP Cal-2019' = rnorm(365, 2, 1), 'AUTVTP Cal-2021' = rnorm(365, 2, 1),
                      'AUTVTP Cal-2022' = rnorm(365, 2, 1), 'AUTVTP Cal-2023' = rnorm(365, 2, 1),
                      'ATW Cal-2019' = rnorm(365, 2, 1), 'ATW Cal-2021' = rnorm(365, 2, 1),
                      'ATW Cal-2022' = rnorm(365, 2, 1), 'ATW Cal-2023' = rnorm(365, 2, 1),
                      'BRN Cal-2019' = rnorm(365, 2, 1), 'BRN Cal-2021' = rnorm(365, 2, 1),
                      'BRN Cal-2022' = rnorm(365, 2, 1), 'BRN Cal-2023' = rnorm(365, 2, 1),
                      'FEUA MDEC1' = rnorm(365, 2, 1),
                      check.names = FALSE)

Now I would like to save / extract each occurring column with the date column in its own data table.现在我想在自己的数据表中保存/提取带有日期列的每个出现的列。 Ideally, all extracted data tables are then added to a list.理想情况下，然后将所有提取的数据表添加到列表中。 I know that I should somehow do this with a for loop, but I can't solve it.我知道我应该以某种方式使用 for 循环来执行此操作，但我无法解决它。

After I have received individual data tables for each product, I would have to do the following for each of the data tables (an example data table is now used here for AUTVTP Cal-2022 ):在我收到每个产品的单独数据表后，我必须对每个数据表执行以下操作（此处为AUTVTP Cal-2022使用了一个示例数据表）：

DT <- data.table(date = seq(as.Date('2020-01-01'), by = '1 day', length.out = 365),
                 'AUTVTP Cal-2022' = rnorm(365, 2, 1), check.names = FALSE)


DT <- DT %>%
  mutate(month = format(date, '%b'), 
         date = format(date, '%d')) %>%
  tidyr::pivot_wider(names_from = date, values_from = 'AUTVTP Cal-2022') %>%
  relocate(`01`, .after = month)

## Calculate monthly and quarterly mean values: ##
DT <- setDT(DT)[, monthAvg := rowMeans(.SD, na.rm = TRUE), .SDcols = -1]
DT <- DT[, quartAvg := mean(monthAvg), ceiling(seq_len(nrow(DT))/3)]
DT <- DT[, yearAvg := mean(monthAvg), ceiling(seq_len(nrow(DT))/12)]

## Round all values of the data table to 2 digits: ##
DT <- DT %>% mutate_if(is.numeric, round, 2)

HOW CAN I DO THIS?我怎样才能做到这一点？

Answer 1

Reshape to long format, then split.重塑为长格式，然后拆分。

split(
  melt(dt.data, id.vars = "date"),
  by = "variable", keep.by = FALSE)

You can then use lapply to iterate over the list and do whatever your tidyverse code does.然后，您可以使用lapply遍历列表并执行 tidyverse 代码所做的任何操作。

However, generally you shouldn't split a data.table.但是，通常您不应该拆分 data.table。 It's inefficient and often not necessary.它效率低下，通常没有必要。

Edit:编辑：

I suggest you forget the splitting.我建议你忘记拆分。 Wrap your code in a function like this:将您的代码包装在这样的函数中：

foo <- function(DT, colname) {
  DT <- DT[, c("date", colname), with = FALSE]
  DT <- DT %>%
    mutate(month = format(date, '%b'), 
           date = format(date, '%d')) %>%
    tidyr::pivot_wider(names_from = date, values_from = colname) %>%
    relocate(`01`, .after = month)
  
  ## Calculate monthly and quarterly mean values: ##
  DT <- setDT(DT)[, monthAvg := rowMeans(.SD, na.rm = TRUE), .SDcols = -1]
  DT <- DT[, quartAvg := mean(monthAvg), ceiling(seq_len(nrow(DT))/3)]
  DT <- DT[, yearAvg := mean(monthAvg), ceiling(seq_len(nrow(DT))/12)]
  
  ## Round all values of the data table to 2 digits: ##
  DT %>% mutate_if(is.numeric, round, 2)
}

Then, when you need the table for a specific column in your shiny app, you can simply call this function:然后，当您需要闪亮应用程序中特定列的表时，您可以简单地调用此函数：

foo(dt.data, 'DEB Cal-2019')

If you insist on pre-computing the list:如果你坚持预先计算列表：

lapply(names(dt.data)[names(dt.data) != "date"], 
       foo, DT = dt.data)

Answer 2

使用split.default创建一个数据split.default列表，并将第一列cbind到每个列表。

lapply(split.default(dt.data[, -1], names(dt.data[, -1])), cbind, dt.data[, 1])

从大数据表中提取列到小数据表并保存在列表中

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-10-28 11:59:44

解决方案2
1 2020-10-29 04:18:51

从大数据表中提取列到小数据表并保存在列表中

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-10-28 11:59:44

解决方案2 1 2020-10-29 04:18:51

解决方案1
1 已采纳 2020-10-28 11:59:44

解决方案2
1 2020-10-29 04:18:51