How can I split a character string in a dataframe into multiple columns

Question

I'm working with a dataframe, one column of which contains values that are mostly numeric but may contain non-numeric entries. I would like to split this column into multiple columns. One of the new columns should contain the numeric portion of the original entry and another column should contain any non-numeric elements.

Here is a sample data frame:

df <- data.frame(ID=1:4,x=c('< 0.1','100','A 2.5', '200'))

Here is what I would like the data frame to look like:

ID   x1   x2
1    <    0.1
2         100
3    A    2.5
4         200

On feature of the data I am currently taking advantage of is that the structure of the character strings is always as follows: the non-numeric elements (if they exist) always precede the numeric elements and the two elements are always separated with a space.

I can use colsplit from the reshape package to split the column based on whitespace. The problem with this is that it replicates any entry that can't be split into two elements,

require(reshape)
df <- transform(df, x=colsplit(x,split=" ", names("x1","x2")))
df
ID  x1   x2
1   <    0.1
2   100  100
3   A    2.5
4   200  200

This is not terribly problematic as I can just do some post-processing to remove the numeric elements from column "x1."

I can also accomplish what I would like to do using strsplit inside a function:

split.fn <- function(id){
 new.val <- unlist(strsplit(as.character(df$x[df$ID==id])," "))
   if(length(new.val)==1){
     return(data.frame(ID=id,x1="NA",x2=new.val))
   }else{
     return(data.frame(ID=id,x1=new.val[1],x2=new.val[2]))
   }  

}
data.frame(rbindlist(lapply(unique(df$ID),split.fn)))
ID   x1   x2
1    <    0.1
2    NA   100
3    A    2.5
4    NA   200

but this seems cumbersome.

Basically both options I've outlined here will work. But I suspect there is a more elegant or direct way to do get the desired data frame.

Answer 1

You can use separate() from tidyr

tidyr::separate(df, x, c("x1", "x2"), " ", fill = "left")
#   ID   x1  x2
# 1  1    < 0.1
# 2  2 <NA> 100
# 3  3    A 2.5
# 4  4 <NA> 200

If you absolutely need to remove the NA values, then you can do

tdy <- tidyr::separate(df, x, c("x1", "x2"), " ", fill = "left")
tdy[is.na(tdy)] <- ""

and then we have

tdy
#   ID x1  x2
# 1  1  < 0.1
# 2  2    100
# 3  3  A 2.5
# 4  4    200

Answer 2

This does not use any packages:

transform(df,
  x1 = ifelse(grepl(" ", x), sub(" .*", "", x), NA),
  x2 = sub(".* ", "", paste(x)))

giving:

  ID     x   x1  x2
1  1 < 0.1    < 0.1
2  2   100 <NA> 100
3  3 A 2.5    A 2.5
4  4   200 <NA> 200

How can I split a character string in a dataframe into multiple columns

Question

2 answers

solution1
5 ACCPTED 2015-09-29 22:30:39

solution2
2 2015-09-29 23:11:50

How can I split a character string in a dataframe into multiple columns

Question

2 answers

solution1 5 ACCPTED 2015-09-29 22:30:39

solution2 2 2015-09-29 23:11:50

solution1
5 ACCPTED 2015-09-29 22:30:39

solution2
2 2015-09-29 23:11:50