I have a column called 'WFBS' that has over a million rows of strings of different lengths that look like this:
WFBS <- c("M010203", "S01020304", "N104509")
and I need an output that looks like this:
WFBS1 <- c("M01", "S01", "N10")
WFBS2 <- c("02", "02", "45")
WFBS3 <- c("03", "03", "09")
WFBS4 <- c(NA, "04", NA)
So I need to separate each string in: first column: 3 characters (ie the letter followed by 2 digits) rest of the columns: 2 characters per column until I have no characters left
I tried using the function strsplit, but it says that my variables are not characters, so then I created a vector x as follows:
x <- as.character(WFBS)
but then I don't know how to separate the string into columns with the function strsplit.
An option with base R
bu creating a delimiter ,
using sub
, read with read.csv
to create a 4 column data.frame
read.csv(text = sub("^(...)(..)(..)(.*)", "\\1,\\2,\\3,\\4", WFBS),
header = FALSE, colClasses = rep("character", 4), na.strings = "",
col.names =paste0("WFBS", 1:4), stringsAsFactors = FALSE)
# WFBS1 WFBS2 WFBS3 WFBS4
#1 M01 02 03 <NA>
#2 S01 02 03 04
#3 N10 45 09 <NA>
This might be a useful starting point:
library(tidyr)
df <- data.frame(WFBS = c("M010203", "S01020304", "N104509"),
stringsAsFactors = FALSE)
> df %>% separate(col = WFBS,
into = c("WFBS1","WFBS2","WFBS3","WFBS4"),
sep = c(3,5,7))
WFBS1 WFBS2 WFBS3 WFBS4
1 M01 02 03
2 S01 02 03 04
3 N10 45 09
This leaves you with empty strings rather than NAs in the remainder spots, which you'd have to convert.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.