简体   繁体   中英

Split character in column and name

I want to split characters. Although I have a large dataframe to work, the following small example to show what need to be done.

  mydf <- data.frame (name = c("L1", "L2", "L3"), 
    M1 = c("AC", "AT", NA), M2 = c("CC", "--", "TC"), M3 = c("AT", "TT", "AG"))

I want to split the characters for variables M1 to M3 (in real dataset I have > 6000 variables)

  name  M1a M1b   M2a M2b  M3a  M3b 
   L1   A    C    C    C    A     T
   L2   A    T    -    -    T     T
   L3   NA   NA   T     C    A     G

I tried the following codes:

func<- function(x) {sapply( strsplit(x, ""),
                     match, table= c("A","C","T","G", "--", NA))}

odataframe <- data.frame(apply(mydf, 1, func) )
colnames(odataframe) <-  paste(rep(names(mydf), each = 2), c("a", "b"), sep = "")
odataframe

Here you go:

splitCol <- function(x){
  x <- as.character(x)
  x[is.na(x)] <- "$$"
  z <- matrix(unlist(strsplit(x, split="")), ncol=2, byrow=TRUE)
  z[z=="$"] <- NA
  z
}


newdf <- as.data.frame(do.call(cbind, lapply(mydf[, -1], splitCol)))
names(newdf) <- paste(rep(names(mydf[, -1]), each=2), c("a", "b"), sep="")
newdf <- data.frame(mydf[, 1, drop=FALSE], newdf)

newdf
  name  M1a  M1b M2a M2b M3a M3b
1   L1    A    C   C   C   A   T
2   L2    A    T   -   -   T   T
3   L3 <NA>  <NA   T   C   A   G

Andrie's code as a replicable function

splitCol <- function(dataframe, splitVars=names(dataframe)){
split.DF <- dataframe[,splitVars]
keep.DF <- dataframe[, !names(dataframe) %in% c(splitVars)]

X <- function(x)matrix(unlist(strsplit(as.character(x), split="")), ncol=2, byrow=TRUE)

newdf <- as.data.frame(do.call(cbind, suppressWarnings(lapply(split.DF, X))) )
names(newdf) <- paste(rep(names(split.DF), each=2), c(".a", ".b"), sep="") 
data.frame(keep.DF,newdf)
}

Test it out

splitCol(mydf)
splitCol(mydf, c('M1','M2'))

Please don't vote this as the correct answer. Andrie's answer is clearly the first correct answer. This is just an extension of his code to more situations. Thanks for the question and thanks for the code Andrie.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM