I have the following data structure, with Stocks S , having features f :
year S1_f1 S1_f2 S2_f1 S2_f2 S3_f1 S3_f2 Sn_f1 Sn_f2
2011 0.1 0.4 0.12 0.42 0.2 0.5 n n
2012 0.4 0.7 0.42 0.72 0.5 0.8 n n
2013 0.7 0.9 0.72 0.5 0.8 0.9 n n
n n n n n n n n n
My original df has 10 observations but 50k+ predictors - so I want to generate more balance on the observation side.
Hence, I want to have the following dataframe:
year S1_f1 S1_f2 S2_f1 S2_f2 S3_f1 S3_f2 Sn_f1 Sn_f2
2011 0.1 0.4 0 0 0 0 0 0
2012 0.4 0.7 0 0 0 0 0 0
2013 0.7 0.9 0 0 0 0 0 0
2011 0 0 0.12 0.42 0 0 0 0
2012 0 0 0.42 0.72 0 0 0 0
2013 0 0 0.72 0.5 0 0 0 0
2011 0 0 0 0 0.2 0.5 0 0
2012 0 0 0 0 0.5 0.8 0 0
2013 0 0 0 0 0.8 0.9 0 0
n 0 0 0 0 0 0 n n
...and so on (example values).
I want to artificially multiply my timestamps via this approach.
Is there an elegant way to do this?
You can convert what you have into what you want using the following code:
library(data.table)
dcast(
melt(setDT(s), id="year")[, grp:=gsub("_.*$","",variable)],
year+grp~variable,
value.var="value"
)[order(grp,year)]
Output:
year grp S1_f1 S1_f2 S2_f1 S2_f2 S3_f1 S3_f2
<int> <char> <num> <num> <num> <num> <num> <num>
1: 2011 S1 0.1 0.4 NA NA NA NA
2: 2012 S1 0.4 0.7 NA NA NA NA
3: 2013 S1 0.7 0.9 NA NA NA NA
4: 2011 S2 NA NA 0.12 0.42 NA NA
5: 2012 S2 NA NA 0.42 0.72 NA NA
6: 2013 S2 NA NA 0.72 0.50 NA NA
7: 2011 S3 NA NA NA NA 0.2 0.5
8: 2012 S3 NA NA NA NA 0.5 0.8
9: 2013 S3 NA NA NA NA 0.8 0.9
Input:
structure(list(year = 2011:2013, S1_f1 = c(0.1, 0.4, 0.7), S1_f2 = c(0.4,
0.7, 0.9), S2_f1 = c(0.12, 0.42, 0.72), S2_f2 = c(0.42, 0.72,
0.5), S3_f1 = c(0.2, 0.5, 0.8), S3_f2 = c(0.5, 0.8, 0.9)), row.names = c(NA,
-3L), class = "data.frame")
One possible way o solve your problem (note that I did not convert the data, say df
, into a data.table
):
library(data.table)
result = sub("^S(\\d)+_.*", "\\1", names(df)[-1]) |>
unique() |>
lapply(function(i) df[sprintf(c("year", "S%s_f1", "S%s_f2"), i)]) |>
rbindlist(use.names=TRUE, fill=TRUE) |>
setnafill(fill=0)
year S1_f1 S1_f2 S2_f1 S2_f2 S3_f1 S3_f2
<int> <num> <num> <num> <num> <num> <num>
1: 2011 0.1 0.4 0.00 0.00 0.0 0.0
2: 2012 0.4 0.7 0.00 0.00 0.0 0.0
3: 2013 0.7 0.9 0.00 0.00 0.0 0.0
4: 2011 0.0 0.0 0.12 0.42 0.0 0.0
5: 2012 0.0 0.0 0.42 0.72 0.0 0.0
6: 2013 0.0 0.0 0.72 0.50 0.0 0.0
7: 2011 0.0 0.0 0.00 0.00 0.2 0.5
8: 2012 0.0 0.0 0.00 0.00 0.5 0.8
9: 2013 0.0 0.0 0.00 0.00 0.8 0.9
Using the sample data frame DF
defined reproducibly in the Note at the end, create a vector g
defining a grouping of the columns which is in the case of the example equals c("S1", "S1", "S2", "S2", "S3", "S3")
. Then use it to split the columns into a list of matrices L
, one matrix for each level of g
. Apply .bdiag
from the Matrix package to that list to create a block diagonal matrix and insert the year column and set the column names. Note that the Matrix package comes with R and does not have to be installed so this only uses base R.
library(Matrix)
g <- sub("_.*", "", names(DF)[-1])
L <- tapply(as.list(DF[-1]), g, function(x) as.matrix(as.data.frame(x)))
setNames(data.frame(DF$year, as.matrix(bdiag(L))), names(DF))
giving:
year S1_f1 S1_f2 S2_f1 S2_f2 S3_f1 S3_f2
1 2011 0.1 0.4 0.00 0.00 0.0 0.0
2 2012 0.4 0.7 0.00 0.00 0.0 0.0
3 2013 0.7 0.9 0.00 0.00 0.0 0.0
4 2011 0.0 0.0 0.12 0.42 0.0 0.0
5 2012 0.0 0.0 0.42 0.72 0.0 0.0
6 2013 0.0 0.0 0.72 0.50 0.0 0.0
7 2011 0.0 0.0 0.00 0.00 0.2 0.5
8 2012 0.0 0.0 0.00 0.00 0.5 0.8
9 2013 0.0 0.0 0.00 0.00 0.8 0.9
Lines <- "
year S1_f1 S1_f2 S2_f1 S2_f2 S3_f1 S3_f2
2011 0.1 0.4 0.12 0.42 0.2 0.5
2012 0.4 0.7 0.42 0.72 0.5 0.8
2013 0.7 0.9 0.72 0.5 0.8 0.9"
DF <- read.table(text = Lines, header = TRUE)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.