I want to split characters. Although I have a large dataframe to work, the following small example to show what need to be done.
mydf <- data.frame (name = c("L1", "L2", "L3"),
M1 = c("AC", "AT", NA), M2 = c("CC", "--", "TC"), M3 = c("AT", "TT", "AG"))
I want to split the characters for variables M1 to M3 (in real dataset I have > 6000 variables)
name M1a M1b M2a M2b M3a M3b
L1 A C C C A T
L2 A T - - T T
L3 NA NA T C A G
I tried the following codes:
func<- function(x) {sapply( strsplit(x, ""),
match, table= c("A","C","T","G", "--", NA))}
odataframe <- data.frame(apply(mydf, 1, func) )
colnames(odataframe) <- paste(rep(names(mydf), each = 2), c("a", "b"), sep = "")
odataframe
Here you go:
splitCol <- function(x){
x <- as.character(x)
x[is.na(x)] <- "$$"
z <- matrix(unlist(strsplit(x, split="")), ncol=2, byrow=TRUE)
z[z=="$"] <- NA
z
}
newdf <- as.data.frame(do.call(cbind, lapply(mydf[, -1], splitCol)))
names(newdf) <- paste(rep(names(mydf[, -1]), each=2), c("a", "b"), sep="")
newdf <- data.frame(mydf[, 1, drop=FALSE], newdf)
newdf
name M1a M1b M2a M2b M3a M3b
1 L1 A C C C A T
2 L2 A T - - T T
3 L3 <NA> <NA T C A G
Andrie's code as a replicable function
splitCol <- function(dataframe, splitVars=names(dataframe)){
split.DF <- dataframe[,splitVars]
keep.DF <- dataframe[, !names(dataframe) %in% c(splitVars)]
X <- function(x)matrix(unlist(strsplit(as.character(x), split="")), ncol=2, byrow=TRUE)
newdf <- as.data.frame(do.call(cbind, suppressWarnings(lapply(split.DF, X))) )
names(newdf) <- paste(rep(names(split.DF), each=2), c(".a", ".b"), sep="")
data.frame(keep.DF,newdf)
}
Test it out
splitCol(mydf)
splitCol(mydf, c('M1','M2'))
Please don't vote this as the correct answer. Andrie's answer is clearly the first correct answer. This is just an extension of his code to more situations. Thanks for the question and thanks for the code Andrie.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.