简体   繁体   English

如何将一串数字和不同长度的字母分成R中的不同列?

[英]How to separate a string of digits and letters of various length into different columns in R?

I have a column called 'WFBS' that has over a million rows of strings of different lengths that look like this: 我有一个名为'WFBS'的列,它有超过一百万行不同长度的字符串,如下所示:

WFBS <- c("M010203", "S01020304", "N104509")

and I need an output that looks like this: 我需要一个如下所示的输出:

WFBS1 <- c("M01", "S01", "N10")
WFBS2 <- c("02", "02", "45")
WFBS3 <- c("03", "03", "09")
WFBS4 <- c(NA, "04", NA)

So I need to separate each string in: first column: 3 characters (ie the letter followed by 2 digits) rest of the columns: 2 characters per column until I have no characters left 所以我需要将每个字符串分开:第一列:3个字符(即字母后跟2个数字)其余列:每列2个字符,直到我没有剩下字符

I tried using the function strsplit, but it says that my variables are not characters, so then I created a vector x as follows: 我尝试使用函数strsplit,但它说我的变量不是字符,所以我创建了一个向量x,如下所示:

x <- as.character(WFBS)

but then I don't know how to separate the string into columns with the function strsplit. 但后来我不知道如何使用函数strsplit将字符串分隔成列。

An option with base R bu creating a delimiter , using sub , read with read.csv to create a 4 column data.frame 使用sub创建分隔符的base R bu的选项,使用read.csv读取以创建4列data.frame

read.csv(text = sub("^(...)(..)(..)(.*)", "\\1,\\2,\\3,\\4", WFBS), 
  header = FALSE, colClasses = rep("character", 4), na.strings = "",
        col.names =paste0("WFBS", 1:4), stringsAsFactors = FALSE)
#    WFBS1 WFBS2 WFBS3 WFBS4
#1   M01    02    03  <NA>
#2   S01    02    03    04
#3   N10    45    09  <NA>

This might be a useful starting point: 这可能是一个有用的起点:

library(tidyr)
df <- data.frame(WFBS = c("M010203", "S01020304", "N104509"),
                 stringsAsFactors = FALSE)
> df %>% separate(col = WFBS,
                  into = c("WFBS1","WFBS2","WFBS3","WFBS4"),
                  sep = c(3,5,7))
  WFBS1 WFBS2 WFBS3 WFBS4
1   M01    02    03      
2   S01    02    03    04
3   N10    45    09      

This leaves you with empty strings rather than NAs in the remainder spots, which you'd have to convert. 这会留下空字符串而不是剩余点中的NA,您必须转换它们。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM