简体   繁体   中英

String split of delimited values in 2 columns simultaneously - R

I'm looking for an elegant way to split and convert semicolon-separated values from two columns into rows. Data looks like as:

1        A0A024R442;E7ETB3;Q9ULA0;E7EMB6           443;465;447;382
2 A0A024R442;E7ETB3;Q9ULA0;E7EMB6;C9JBE1       323;345;327;262;309

The required output is:

A0A024R442        443
    E7ETB3        465
    Q9ULA0        447
  ...................
    C9JBE1        309

The number of elements in columns is always equal per row as depicted above. I have tried the strsplit() function but I think I can't split delimited values on multiple columns simultaneously (please correct me if I'm wrong).

I would appreciate any suggestion.

Here is a data.table option

library(data.table)
DT <- fread(text, header = FALSE)

out <- DT[, lapply(.SD, tstrsplit, split = ";", fixed = TRUE), by = 1:nrow(DT)]
out[, nrow := NULL][]
#           V1  V2
#1: A0A024R442 443
#2:     E7ETB3 465
#3:     Q9ULA0 447
#4:     E7EMB6 382
#5: A0A024R442 323
#6:     E7ETB3 345
#7:     Q9ULA0 327
#8:     E7EMB6 262
#9:     C9JBE1 309

Or in baseR using strsplit instead

DF <- read.table(text = text, stringsAsFactors = FALSE, header = FALSE)
as.data.frame(lapply(DF, function(x) unlist(strsplit(x, split = ";", fixed = TRUE))))

data

text = "A0A024R442;E7ETB3;Q9ULA0;E7EMB6           443;465;447;382
A0A024R442;E7ETB3;Q9ULA0;E7EMB6;C9JBE1       323;345;327;262;309"

base R:

data.frame(matrix(unlist(strsplit(as.matrix(df),";")),ncol=2))

          X1  X2
1 A0A024R442 443
2     E7ETB3 465
3     Q9ULA0 447
4     E7EMB6 382
5 A0A024R442 323
6     E7ETB3 345
7     Q9ULA0 327
8     E7EMB6 262
9     C9JBE1 309

The separate_rows can be used as well

library(tidyverse)
separate_rows(df1, c("V1", "V2"))
#          V1  V2
#1 A0A024R442 443
#2     E7ETB3 465
#3     Q9ULA0 447
#4     E7EMB6 382
#5 A0A024R442 323
#6     E7ETB3 345
#7     Q9ULA0 327
#8     E7EMB6 262
#9     C9JBE1 309

Or using cSplit

library(splitstackshape)
cSplit(df1, c("V1", "V2"), ";", "long")

data

df1 <- structure(list(V1 = c("A0A024R442;E7ETB3;Q9ULA0;E7EMB6", 
      "A0A024R442;E7ETB3;Q9ULA0;E7EMB6;C9JBE1"
), V2 = c("443;465;447;382", "323;345;327;262;309")), 
  row.names = c(NA, 
-2L), class = "data.frame")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM