简体   繁体   中英

Vector to sliding matrix in R

I am trying to create a function that takes a vector and creates two sliding matrix, like bellow:

Input, Output
[d01, d02, d03, d04, d05, d06, d07], [d08, d09, d10, d11, d12, d13, d14]
[d02, d03, d04, d05, d06, d07, d08], [d09, d10, d11, d12, d13, d14, d15]
...

I tried to adapt a Python code to R but I am having some problems and I cannot find the error (I am not used to R)

This is the R code:

create_dataset = function(data, n_input, n_out){
        dataX = c()
        dataY = c()
        in_start = 0
        for (i in 1:range(length(data))) {
                #define the end of the input sequence
                in_end = in_start + n_input
                out_end = in_end + n_out
                        if(out_end <= length(data)){
                                x_input = data[in_start:in_end, 1]
                                X = append(x_input)
                                y = append(data[in_end:out_end], 1)
                        }
                #move along one time step
                in_start = in_start + 1
        }
        
   X; Y
}

I got this error when calling this function

> create_dataset(data, n_input = 5, n_out = 5)
Error in data[in_start:in_end, 1] : incorrect number of dimensions
In addition: Warning message:
In 1:range(length(data)) :
  numerical expression has 2 elements: only the first used

EDIT:

Adding the Python code I trying to adapt to R

# convert history into inputs and outputs
def to_supervised(train, n_input, n_out):
    X, y = list(), list()
    in_start = 0
    # step over the entire history one time step at a time
    for _ in range(len(data)):
        # define the end of the input sequence
        in_end = in_start + n_input
        out_end = in_end + n_out
        # ensure we have enough data for this instance
        if out_end <= len(data):
            x_input = data[in_start:in_end, 0]
            x_input = x_input.reshape((len(x_input), 1))
            X.append(x_input)
            y.append(data[in_end:out_end, 0])
        # move along one time step
        in_start += 1
    return array(X), array(y)

Here are two approaches. Also see Lagging time series data

1) Normally in R one takes the whole object approach rather than iterating over indexes. Now, assuming inputs v, k1 and k2 we compute e as the sliding matrix with k1+k2 columns. Then first k1 columns is the first matrix and the remaining columns is the second.

# inputs
v <- 1:12   # 1, 2, ..., 12
k1 <- k2 <- 3

k <- k1 + k2
e <- embed(v, k)[, k:1]

ik1 <- 1:k1
e[, ik1]
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    2    3    4
## [3,]    3    4    5
## [4,]    4    5    6
## [5,]    5    6    7
## [6,]    6    7    8
## [7,]    7    8    9

e[, -ik1]
##      [,1] [,2] [,3]
## [1,]    4    5    6
## [2,]    5    6    7
## [3,]    6    7    8
## [4,]    7    8    9
## [5,]    8    9   10
## [6,]    9   10   11
## [7,]   10   11   12

2) Regarding the R code in the question:

  • in R the range function takes a vector input and returns a 2 element vector of the minimum and maximum so it is not what is wanted in the for loop, use seq_along instead
  • indexes in R start at 1 rather than 0
  • the return value of a function must be a single object. We return a two element list of matrices.
  • iteratively appending to an object is inefficient in R. This can be addressed by preallocating the result or not using a loop; however, we don't address this problem below as we already have a better implementation above in (1).
  • there was inconsistent naming of variables in the question's code

Although this entire approach is not how one would normally write R software, in order to make the minimal changes to get it to work we can write the following.

# data is plain vector, n_input and n_out are scalars
# result is 2 element list of matrices
create_dataset = function(data, n_input, n_out){
        X <- matrix(nrow = 0, ncol = n_input)
        Y <- matrix(nrow = 0, ncol = n_out)
        in_start <- 0
        for (i in seq_along(data)) {
                #define the end of the input sequence
                in_end <- in_start + n_input
                out_end <- in_end + n_out
                        if(out_end <= length(data)){
                                X <- rbind(X, data[(in_start+1):in_end])
                                Y <- rbind(Y, data[(in_end+1):out_end])
                        }
                #move along one time step
                in_start = in_start + 1
        }
        
   list(X, Y)
}

# inputs defined in (1)
create_dataset(v, k1, k2)

giving this two element list of matrices:

[[1]]
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    2    3    4
[3,]    3    4    5
[4,]    4    5    6
[5,]    5    6    7
[6,]    6    7    8
[7,]    7    8    9

[[2]]
     [,1] [,2] [,3]
[1,]    4    5    6
[2,]    5    6    7
[3,]    6    7    8
[4,]    7    8    9
[5,]    8    9   10
[6,]    9   10   11
[7,]   10   11   12

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM