简体   繁体   中英

splitting comma separated mixed text and numeric string with strsplit in R

I have many strings of the form name1, name2 and name3, 0, 1, 2 or name1, name2, name3 and name4, 0, 1, 2 and would like to split the vector into 4 elements where the first one would be the whole text string of names. The problem is that strsplit doesn't differenciate between text and numbers and split the string into 5 elements in the first case and into 6 elements in the second example. How can I tell R to dynamically skip the text part of the string with variable number of names?

You have two main options:
(1) grep for the numbers, and extract those.
(2) split on the comma, then coerce to numeric and check for NA s

I prefer the second

splat <- strsplit(x, ",")[[1]]
numbs <- !is.na(suppressWarnings(as.numeric(splat)))

c(paste(splat[!numbs], collapse=","), splat[numbs])
# [1] "name1, name2 and name3" " 0" " 1" " 2"

You could also insert a delimiter in the right places, and then split on that:

delimmed <- gsub('(.*[a-z][0-9]+| [0-9]+),','\\1%',strr)
strsplit(delimmed,'%')

The first part of the regular expression (to the left of the | ) matches everything ( .* ) up to the final letter-number-comma combo; and the second matches any space-number-comma combo. The comma is dropped (since it's outside the parentheses) and replaced by % .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM