[英]How to separate a string of digits and letters of various length into different columns in R?
我有一個名為'WFBS'的列,它有超過一百萬行不同長度的字符串,如下所示:
WFBS <- c("M010203", "S01020304", "N104509")
我需要一個如下所示的輸出:
WFBS1 <- c("M01", "S01", "N10")
WFBS2 <- c("02", "02", "45")
WFBS3 <- c("03", "03", "09")
WFBS4 <- c(NA, "04", NA)
所以我需要將每個字符串分開:第一列:3個字符(即字母后跟2個數字)其余列:每列2個字符,直到我沒有剩下字符
我嘗試使用函數strsplit,但它說我的變量不是字符,所以我創建了一個向量x,如下所示:
x <- as.character(WFBS)
但后來我不知道如何使用函數strsplit將字符串分隔成列。
使用sub
創建分隔符的base R
bu的選項,
使用read.csv
讀取以創建4列data.frame
read.csv(text = sub("^(...)(..)(..)(.*)", "\\1,\\2,\\3,\\4", WFBS),
header = FALSE, colClasses = rep("character", 4), na.strings = "",
col.names =paste0("WFBS", 1:4), stringsAsFactors = FALSE)
# WFBS1 WFBS2 WFBS3 WFBS4
#1 M01 02 03 <NA>
#2 S01 02 03 04
#3 N10 45 09 <NA>
這可能是一個有用的起點:
library(tidyr)
df <- data.frame(WFBS = c("M010203", "S01020304", "N104509"),
stringsAsFactors = FALSE)
> df %>% separate(col = WFBS,
into = c("WFBS1","WFBS2","WFBS3","WFBS4"),
sep = c(3,5,7))
WFBS1 WFBS2 WFBS3 WFBS4
1 M01 02 03
2 S01 02 03 04
3 N10 45 09
這會留下空字符串而不是剩余點中的NA,您必須轉換它們。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.