简体   繁体   中英

How to separate a string of digits and letters of various length into different columns in R?

I have a column called 'WFBS' that has over a million rows of strings of different lengths that look like this:

WFBS <- c("M010203", "S01020304", "N104509")

and I need an output that looks like this:

WFBS1 <- c("M01", "S01", "N10")
WFBS2 <- c("02", "02", "45")
WFBS3 <- c("03", "03", "09")
WFBS4 <- c(NA, "04", NA)

So I need to separate each string in: first column: 3 characters (ie the letter followed by 2 digits) rest of the columns: 2 characters per column until I have no characters left

I tried using the function strsplit, but it says that my variables are not characters, so then I created a vector x as follows:

x <- as.character(WFBS)

but then I don't know how to separate the string into columns with the function strsplit.

An option with base R bu creating a delimiter , using sub , read with read.csv to create a 4 column data.frame

read.csv(text = sub("^(...)(..)(..)(.*)", "\\1,\\2,\\3,\\4", WFBS), 
  header = FALSE, colClasses = rep("character", 4), na.strings = "",
        col.names =paste0("WFBS", 1:4), stringsAsFactors = FALSE)
#    WFBS1 WFBS2 WFBS3 WFBS4
#1   M01    02    03  <NA>
#2   S01    02    03    04
#3   N10    45    09  <NA>

This might be a useful starting point:

library(tidyr)
df <- data.frame(WFBS = c("M010203", "S01020304", "N104509"),
                 stringsAsFactors = FALSE)
> df %>% separate(col = WFBS,
                  into = c("WFBS1","WFBS2","WFBS3","WFBS4"),
                  sep = c(3,5,7))
  WFBS1 WFBS2 WFBS3 WFBS4
1   M01    02    03      
2   S01    02    03    04
3   N10    45    09      

This leaves you with empty strings rather than NAs in the remainder spots, which you'd have to convert.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM