简体   繁体   中英

R stringr rebus

I have tried to extract information from strings and can't get what I want. In my data, I usually have 4 (but sometimes only 3) numbers and sometimes number is followed by "/" and one or more words, which should be preserved. Here is what I have tried.

library(stringr)
library(rebus)

patrn <- one_or_more(DGT) %R% DOT %R% one_or_more(DGT) %R% optional("/") %R% optional(one_or_more(WRD))

test %>% 
  str_extract_all(., patrn)

All I get is first letter from the word. I have tried "[aA-zZ]+" as well, but always only get first letter. I would like to have those numbers separated like below, but also what ever comes after numbers included over there. Not sure, if I should use str_split, but sometimes those strings are all in together like [[4]] in example.

[[1]]
[1] "20.0" "17.0" "19.0" "20.0"

[[2]]
[1] "12.0" "17.0" "20.0" "14.0"

[[3]]
[1] "15.5" "19.0" "12.5"

[[4]]
[1] "15.0" "17.5" "13.5" "11.5"

data:

test <- c("20.0/Ready Credit 17.0 19.0/Gashaw Boko 20.0", "12.0/Splendid Justine 17.0 20.0/Ranch Pronto 14.0", 
    "15.5/Norman Price 19.0 12.5", "15.0/Hell Broke Luce17.5/Double Boost 13.5 11.5")

I noticed that your generated pattern would look like the following:

 <regex> [\d]+\.[\d]+[/]?[[\w]+]?

I believe the optional tokens should be placed inside parentheses (instead of brackets), as follows:

 <regex> [\d]+\.[\d]+(/)?([\w]+)?

 Or even simpler:

 <regex> [\d]+\.[\d]+(/[\w]+)?

Therefore, as a workaround, I have changed your pattern construction to look like the following:

 patrn <- one_or_more(DGT) %R% DOT %R% one_or_more(DGT) %R% "(/" %R% one_or_more(WRD) %R% ")?"
 patrn
 #<regex> [\d]+\.[\d]+(/[\w]+)?

You may even use this generated pattern directly for your convenience, as follows:

test %>% 
  str_extract_all(., '[\\d]+\\.[\\d]+(/[\\w]+)?')

Using such pattern, you get the following desired output:

[[1]]
[1] "20.0/Ready"  "17.0"        "19.0/Gashaw" "20.0"       

[[2]]
[1] "12.0/Splendid" "17.0"          "20.0/Ranch"    "14.0"         

[[3]]
[1] "15.5/Norman" "19.0"        "12.5"       

[[4]]
[1] "15.0/Hell"   "17.5/Double" "13.5"        "11.5"  

Hope it helps.

Is this what you want?

library(stringr)

test <- c("20.0/Ready Credit 17.0 19.0/Gashaw Boko 20.0", 
          "12.0/Splendid Justine 17.0 20.0/Ranch Pronto 14.0", 
          "15.5/Norman Price 19.0 12.5", 
          "15.0/Hell Broke Luce17.5/Double Boost 13.5 11.5")

str_extract_all(test, "[0-9]+\\.[0-9]+[^0-9]*")

[1] "20.0/Ready Credit " "17.0 "              "19.0/Gashaw Boko "  "20.0"              
[1] "12.0/Splendid Justine " "17.0 "                  "20.0/Ranch Pronto "     "14.0"                  
[1] "15.5/Norman Price " "19.0 "              "12.5"              
[1] "15.0/Hell Broke Luce" "17.5/Double Boost "   "13.5 "                "11.5"   

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM