简体   繁体   中英

separate character string at first digit with “*” in the string

This is an easy one I think but I cannot see what I'm missing. I want to split the string at the first digit. Works great until there is a non-alphanumeric symbol in the string. Help!

Works:

pet<-c("Dog 100","Cat? 340")
df<-as.data.frame(pet)
df_split<-separate(df, pet, into = c("Animal", "Total"), sep = "(?<=[a-zA-Z])\\s*(?=[0-9])")

The first line works great but the second line does not split. Where am I going wrong?

We can use read.table from base R

read.table(text = sub("?", "", df$pet, fixed = TRUE), header = FALSE,
  col.names = c("Animal", "Total"))
#    Animal Total
#1    Dog   100
#2    Cat   340

Note that for the current scenario , it is enough to split with 1+ whitespaces that are followed with 1+ digits to the end of the string:

> separate(df, pet, into = c("Animal", "Total"), sep = "\\s+(?=[0-9]+$)")
## =>  Animal Total
## =>    1    Dog   100
## =>    2   Cat?   340

See the regex demo .

However, in a general case , it is much easier to use tidyr::extract here since the pattern you need will be miuch simpler:

^(\D*?)\s*(\d.*)

Note that if your strings can have newlines, you will need to prepend the pattern with (?s) , a so-called DOTALL modifier that allows . to match line break chars in an ICU pattern.

See the regex demo .

Regex details

  • ^ - start of string
  • (\D*?) - Group 1 (here, Animal column): any 0+ non-digit symbols, as few as possible
  • \s* - 0 or more whitespaces
  • (\d.*) - Group 2 (here, Total column): a digit followed with any 0+ chars (other than line break chars if (?s) is not used), as many as possible ( * is a greedy quantifier).

R code snippet:

library(tidyr)
df_split<-extract(df, pet, into = c("Animal", "Total"), regex="(\\D*)(\\d.*)")
df_split
# =>   Animal Total
# => 1   Dog    100
# => 2  Cat?    340

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM