[英]How to separate a string of digits and letters of various length into different columns in R?
I have a column called 'WFBS' that has over a million rows of strings of different lengths that look like this: 我有一个名为'WFBS'的列,它有超过一百万行不同长度的字符串,如下所示:
WFBS <- c("M010203", "S01020304", "N104509")
and I need an output that looks like this: 我需要一个如下所示的输出:
WFBS1 <- c("M01", "S01", "N10")
WFBS2 <- c("02", "02", "45")
WFBS3 <- c("03", "03", "09")
WFBS4 <- c(NA, "04", NA)
So I need to separate each string in: first column: 3 characters (ie the letter followed by 2 digits) rest of the columns: 2 characters per column until I have no characters left 所以我需要将每个字符串分开:第一列:3个字符(即字母后跟2个数字)其余列:每列2个字符,直到我没有剩下字符
I tried using the function strsplit, but it says that my variables are not characters, so then I created a vector x as follows: 我尝试使用函数strsplit,但它说我的变量不是字符,所以我创建了一个向量x,如下所示:
x <- as.character(WFBS)
but then I don't know how to separate the string into columns with the function strsplit. 但后来我不知道如何使用函数strsplit将字符串分隔成列。
An option with base R
bu creating a delimiter ,
using sub
, read with read.csv
to create a 4 column data.frame 使用
sub
创建分隔符的base R
bu的选项,
使用read.csv
读取以创建4列data.frame
read.csv(text = sub("^(...)(..)(..)(.*)", "\\1,\\2,\\3,\\4", WFBS),
header = FALSE, colClasses = rep("character", 4), na.strings = "",
col.names =paste0("WFBS", 1:4), stringsAsFactors = FALSE)
# WFBS1 WFBS2 WFBS3 WFBS4
#1 M01 02 03 <NA>
#2 S01 02 03 04
#3 N10 45 09 <NA>
This might be a useful starting point: 这可能是一个有用的起点:
library(tidyr)
df <- data.frame(WFBS = c("M010203", "S01020304", "N104509"),
stringsAsFactors = FALSE)
> df %>% separate(col = WFBS,
into = c("WFBS1","WFBS2","WFBS3","WFBS4"),
sep = c(3,5,7))
WFBS1 WFBS2 WFBS3 WFBS4
1 M01 02 03
2 S01 02 03 04
3 N10 45 09
This leaves you with empty strings rather than NAs in the remainder spots, which you'd have to convert. 这会留下空字符串而不是剩余点中的NA,您必须转换它们。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.