R - 时间序列数据的预处理

Question

I have the following data structure, with Stocks S , having features f :我有以下数据结构，带有 Stocks S ，具有特征f ：

year S1_f1  S1_f2 S2_f1 S2_f2 S3_f1 S3_f2 Sn_f1 Sn_f2
2011   0.1    0.4  0.12  0.42   0.2   0.5     n     n
2012   0.4    0.7  0.42  0.72   0.5   0.8     n     n
2013   0.7    0.9  0.72   0.5   0.8   0.9     n     n
n        n      n     n     n     n     n     n     n

My original df has 10 observations but 50k+ predictors - so I want to generate more balance on the observation side.我原来的 df 有 10 个观察值，但有 50k+ 个预测变量 - 所以我想在观察方面产生更多的平衡。

Hence, I want to have the following dataframe:因此，我想要以下 dataframe：

year S1_f1 S1_f2 S2_f1 S2_f2 S3_f1 S3_f2 Sn_f1 Sn_f2
2011   0.1   0.4     0     0     0     0     0     0
2012   0.4   0.7     0     0     0     0     0     0
2013   0.7   0.9     0     0     0     0     0     0
2011     0     0  0.12  0.42     0     0     0     0
2012     0     0  0.42  0.72     0     0     0     0
2013     0     0  0.72   0.5     0     0     0     0
2011     0     0     0     0   0.2   0.5     0     0
2012     0     0     0     0   0.5   0.8     0     0
2013     0     0     0     0   0.8   0.9     0     0
n        0     0     0     0     0     0     n     n

...and so on (example values). ...等等（示例值）。

I want to artificially multiply my timestamps via this approach.我想通过这种方法人为地增加我的时间戳。

Is there an elegant way to do this?有没有一种优雅的方式来做到这一点？

Answer 1

You can convert what you have into what you want using the following code:您可以使用以下代码将您拥有的内容转换为您想要的内容：

library(data.table)
dcast(
  melt(setDT(s), id="year")[, grp:=gsub("_.*$","",variable)],
  year+grp~variable,
  value.var="value"
  )[order(grp,year)]

Output: Output：

    year    grp S1_f1 S1_f2 S2_f1 S2_f2 S3_f1 S3_f2
   <int> <char> <num> <num> <num> <num> <num> <num>
1:  2011     S1   0.1   0.4    NA    NA    NA    NA
2:  2012     S1   0.4   0.7    NA    NA    NA    NA
3:  2013     S1   0.7   0.9    NA    NA    NA    NA
4:  2011     S2    NA    NA  0.12  0.42    NA    NA
5:  2012     S2    NA    NA  0.42  0.72    NA    NA
6:  2013     S2    NA    NA  0.72  0.50    NA    NA
7:  2011     S3    NA    NA    NA    NA   0.2   0.5
8:  2012     S3    NA    NA    NA    NA   0.5   0.8
9:  2013     S3    NA    NA    NA    NA   0.8   0.9

Input:输入：

structure(list(year = 2011:2013, S1_f1 = c(0.1, 0.4, 0.7), S1_f2 = c(0.4, 
0.7, 0.9), S2_f1 = c(0.12, 0.42, 0.72), S2_f2 = c(0.42, 0.72, 
0.5), S3_f1 = c(0.2, 0.5, 0.8), S3_f2 = c(0.5, 0.8, 0.9)), row.names = c(NA, 
-3L), class = "data.frame")

Answer 2

One possible way o solve your problem (note that I did not convert the data, say df , into a data.table ):解决您的问题的一种可能方法（请注意，我没有将数据（例如df ）转换为data.table ）：

library(data.table)

result = sub("^S(\\d)+_.*", "\\1", names(df)[-1]) |> 
  unique() |> 
  lapply(function(i) df[sprintf(c("year", "S%s_f1", "S%s_f2"), i)]) |> 
  rbindlist(use.names=TRUE, fill=TRUE) |> 
  setnafill(fill=0)

    year S1_f1 S1_f2 S2_f1 S2_f2 S3_f1 S3_f2
   <int> <num> <num> <num> <num> <num> <num>
1:  2011   0.1   0.4  0.00  0.00   0.0   0.0
2:  2012   0.4   0.7  0.00  0.00   0.0   0.0
3:  2013   0.7   0.9  0.00  0.00   0.0   0.0
4:  2011   0.0   0.0  0.12  0.42   0.0   0.0
5:  2012   0.0   0.0  0.42  0.72   0.0   0.0
6:  2013   0.0   0.0  0.72  0.50   0.0   0.0
7:  2011   0.0   0.0  0.00  0.00   0.2   0.5
8:  2012   0.0   0.0  0.00  0.00   0.5   0.8
9:  2013   0.0   0.0  0.00  0.00   0.8   0.9

Answer 3

Using the sample data frame DF defined reproducibly in the Note at the end, create a vector g defining a grouping of the columns which is in the case of the example equals c("S1", "S1", "S2", "S2", "S3", "S3") .使用最后在注释中可重复定义的样本数据框DF ，创建一个向量g定义列的分组，在示例的情况下等于c("S1", "S1", "S2", "S2", "S3", "S3") 。 Then use it to split the columns into a list of matrices L , one matrix for each level of g .然后使用它将列拆分为矩阵L的列表，每个级别的g都有一个矩阵。 Apply .bdiag from the Matrix package to that list to create a block diagonal matrix and insert the year column and set the column names.将矩阵.bdiag中的 .bdiag 应用于该列表以创建块对角矩阵并插入年份列并设置列名。 Note that the Matrix package comes with R and does not have to be installed so this only uses base R.请注意，矩阵 package 随附 R 并且不必安装，因此仅使用基础 R。

library(Matrix)

g <- sub("_.*", "", names(DF)[-1])
L <- tapply(as.list(DF[-1]), g, function(x) as.matrix(as.data.frame(x)))
setNames(data.frame(DF$year, as.matrix(bdiag(L))), names(DF))

giving:给予：

  year S1_f1 S1_f2 S2_f1 S2_f2 S3_f1 S3_f2
1 2011   0.1   0.4  0.00  0.00   0.0   0.0
2 2012   0.4   0.7  0.00  0.00   0.0   0.0
3 2013   0.7   0.9  0.00  0.00   0.0   0.0
4 2011   0.0   0.0  0.12  0.42   0.0   0.0
5 2012   0.0   0.0  0.42  0.72   0.0   0.0
6 2013   0.0   0.0  0.72  0.50   0.0   0.0
7 2011   0.0   0.0  0.00  0.00   0.2   0.5
8 2012   0.0   0.0  0.00  0.00   0.5   0.8
9 2013   0.0   0.0  0.00  0.00   0.8   0.9

Note笔记

Lines <- "
year S1_f1  S1_f2 S2_f1 S2_f2 S3_f1 S3_f2
2011   0.1    0.4  0.12  0.42   0.2   0.5
2012   0.4    0.7  0.42  0.72   0.5   0.8
2013   0.7    0.9  0.72   0.5   0.8   0.9"
DF <- read.table(text = Lines, header = TRUE)

R - 时间序列数据的预处理

问题描述

3 个解决方案

解决方案1
0 2022-08-10 19:57:16

解决方案2
0 2022-08-10 20:43:15

解决方案3
0 2022-08-11 03:21:07

Note笔记

R - 时间序列数据的预处理

问题描述

3 个解决方案

解决方案1 0 2022-08-10 19:57:16

解决方案2 0 2022-08-10 20:43:15

解决方案3 0 2022-08-11 03:21:07

Note笔记

解决方案1
0 2022-08-10 19:57:16

解决方案2
0 2022-08-10 20:43:15

解决方案3
0 2022-08-11 03:21:07