简体   繁体   中英

How to combine irregular number of columns into one in R

在此处输入图像描述

I have a data file that needs some tidying. I want everything from Mfb to [gbkey=CDS] to combine into one column. also, the string that is in double-quotes, should get split into individual columns at the semicolon deliminator. the number of columns in which the string needs to be split is irregular throughout the file. it just needs to be split at deliminator.

Some thoughts, not sure how they will work.

# somefile <- readLines("somefile.dat")
somefile <- c(
  'MfB...[dbkey=CDS]"HEAT_2 :HEAT_2 :"',
  'MfB...[dbkey=CDS]"NO_DOMAIN"'
)

gsub('^([^"]*).*', '\\1', somefile)
# [1] "MfB...[dbkey=CDS]" "MfB...[dbkey=CDS]"
gsub('^[^"]*"(.*)".*', '\\1', somefile)
# [1] "HEAT_2 :HEAT_2 :" "NO_DOMAIN"       
splits <- strsplit(gsub('^[^"]*"(.*)".*', '\\1', somefile), "\\s*:\\s*")
splits
# [[1]]
# [1] "HEAT_2" "HEAT_2"
# [[2]]
# [1] "NO_DOMAIN"
dat <- do.call(rbind.data.frame,
               c(lapply(splits, `length<-`, max(lengths(splits))),
                 list(stringsAsFactors = FALSE)))
names(dat) <- paste0("V", seq_along(dat))
dat$V0 <- gsub('^([^"]*).*', '\\1', somefile)
dat
#          V1     V2                V0
# 1    HEAT_2 HEAT_2 MfB...[dbkey=CDS]
# 2 NO_DOMAIN   <NA> MfB...[dbkey=CDS]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM