separate character string at first digit with “*” in the string

Question

This is an easy one I think but I cannot see what I'm missing. I want to split the string at the first digit. Works great until there is a non-alphanumeric symbol in the string. Help!

Works:

pet<-c("Dog 100","Cat? 340")
df<-as.data.frame(pet)
df_split<-separate(df, pet, into = c("Animal", "Total"), sep = "(?<=[a-zA-Z])\\s*(?=[0-9])")

The first line works great but the second line does not split. Where am I going wrong?

Answer 1

We can use read.table from base R

read.table(text = sub("?", "", df$pet, fixed = TRUE), header = FALSE,
  col.names = c("Animal", "Total"))
#    Animal Total
#1    Dog   100
#2    Cat   340

Answer 2

Note that for the current scenario , it is enough to split with 1+ whitespaces that are followed with 1+ digits to the end of the string:

> separate(df, pet, into = c("Animal", "Total"), sep = "\\s+(?=[0-9]+$)")
## =>  Animal Total
## =>    1    Dog   100
## =>    2   Cat?   340

See the regex demo .

However, in a general case , it is much easier to use tidyr::extract here since the pattern you need will be miuch simpler:

^(\D*?)\s*(\d.*)

Note that if your strings can have newlines, you will need to prepend the pattern with (?s) , a so-called DOTALL modifier that allows . to match line break chars in an ICU pattern.

See the regex demo .

Regex details

^ - start of string
(\D*?) - Group 1 (here, Animal column): any 0+ non-digit symbols, as few as possible
\s* - 0 or more whitespaces
(\d.*) - Group 2 (here, Total column): a digit followed with any 0+ chars (other than line break chars if (?s) is not used), as many as possible ( * is a greedy quantifier).

R code snippet:

library(tidyr)
df_split<-extract(df, pet, into = c("Animal", "Total"), regex="(\\D*)(\\d.*)")
df_split
# =>   Animal Total
# => 1   Dog    100
# => 2  Cat?    340

separate character string at first digit with “*” in the string

Question

2 answers

solution1
1 2019-10-02 17:26:09

solution2
1 ACCPTED 2019-10-21 18:26:44

separate character string at first digit with “*” in the string

Question

2 answers

solution1 1 2019-10-02 17:26:09

solution2 1 ACCPTED 2019-10-21 18:26:44

solution1
1 2019-10-02 17:26:09

solution2
1 ACCPTED 2019-10-21 18:26:44