简体   繁体   中英

Inserting NAs columns in specific positions of a data frame in R

I'm struggling to try to insert NAs columns in specific positions of a data frame.

For instance, I have the dataset:

dataset <- data.frame(c1 = 1:5, 
                      c2 = 2:6, 
                      c3 = 3:7, 
                      c4 = 4:8, 
                      c5 = 5:9, 
                      c6 = 10:14, 
                      c7 = 15:19, 
                      c8 = 20:24, 
                      c9 = 25:29, 
                      c10 = 30:34)

I'd like to insert, in this example 4 NAs columns after each 2 existent columns of dataset . The answer would be something like:

dataset.answer <- data.frame(c1 = 1:5,  
                             c2 = 2:6,
                             c.und.1<-rep(NA,dim(dataset)[1]), 
                             c.und.1<-rep(NA,dim(dataset)[1]), 
                             c.und.1<-rep(NA,dim(dataset)[1]), 
                             c.und.1<-rep(NA,dim(dataset)[1]),
                             c3=3:7,
                             c4=4:8,
                             c.und.1<-rep(NA,dim(dataset)[1]),
                             c.und.1<-rep(NA,dim(dataset)[1]), 
                             c.und.1<-rep(NA,dim(dataset)[1]), 
                             c.und.1<-rep(NA,dim(dataset)[1]),
                             c5=5:9,
                             c6=10:14,
                             c.und.1<-rep(NA,dim(dataset)[1]),
                             c.und.1<-rep(NA,dim(dataset)[1]), 
                             c.und.1<-rep(NA,dim(dataset)[1]),
                             c.und.1<-rep(NA,dim(dataset)[1]),
                             c7=15:19,
                             c8=20:24,
                             c.und.1<-rep(NA,dim(dataset)[1]), 
                             c.und.1<-rep(NA,dim(dataset)[1]), 
                             c.und.1<-rep(NA,dim(dataset)[1]), 
                             c.und.1<-rep(NA,dim(dataset)[1]),
                             c9=25:29,
                             c10=30:34)

Any suggestion of an elegant way to to it?

Maybe the following base R solution is not very elegant but I believe it works.

insertNAcol <- function(DF, every = 2, na.cols = 4){
  n <- ncol(DF)
  tmp <- DF[1]
  tmp[2:(1 + na.cols)] <- NA
  tmp <- tmp[-1]
  m <- n %/% na.cols
  res <- DF[1:every]
  for(i in seq_len(m)[-1]){
    DF2 <- DF[(every*(i - 1) + 1):(every*i)]
    res <- cbind.data.frame(res, tmp, DF2)
  }
  res <- cbind(res, tmp, DF[(n - every + 1):n])
  res
}

insertNAcol(dataset)
insertNAcol(dataset, 3, 3)

Create a second data set of NA values then replace the relevant columns.

df <- as.data.frame(matrix(NA, nrow(dataset), 6 * ncol(dataset) / 2 - 4))
cls <- rep(seq(0, ncol(df), by = 6), each=2) + 1:2
df[cls] <- dataset
df
#   V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 V26
# 1  1  2 NA NA NA NA  3  4 NA  NA  NA  NA   5  10  NA  NA  NA  NA  15  20  NA  NA  NA  NA  25  30
# 2  2  3 NA NA NA NA  4  5 NA  NA  NA  NA   6  11  NA  NA  NA  NA  16  21  NA  NA  NA  NA  26  31
# 3  3  4 NA NA NA NA  5  6 NA  NA  NA  NA   7  12  NA  NA  NA  NA  17  22  NA  NA  NA  NA  27  32
# 4  4  5 NA NA NA NA  6  7 NA  NA  NA  NA   8  13  NA  NA  NA  NA  18  23  NA  NA  NA  NA  28  33
# 5  5  6 NA NA NA NA  7  8 NA  NA  NA  NA   9  14  NA  NA  NA  NA  19  24  NA  NA  NA  NA  29  34

The number of columns 6 * ncol(dataset) / 2 -4 of df is determined by

  • 6 - 2 numeric columns + 4 NA columns
  • ncol(dataset) / 2 - the number of "sets" we are creating
  • - 4 - to remove the 4 NA columns that would be tacked on to the end

Replacing column names here will be fairly easy with

names(df)[cls] <- names(dataset)
names(df)[-cls] <- "c.und.1"

Although it's not recommended to have multiple columns with the same name.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM