I have tried to extract information from strings and can't get what I want. In my data, I usually have 4 (but sometimes only 3) numbers and sometimes number is followed by "/" and one or more words, which should be preserved. Here is what I have tried.
library(stringr)
library(rebus)
patrn <- one_or_more(DGT) %R% DOT %R% one_or_more(DGT) %R% optional("/") %R% optional(one_or_more(WRD))
test %>%
str_extract_all(., patrn)
All I get is first letter from the word. I have tried "[aA-zZ]+"
as well, but always only get first letter. I would like to have those numbers separated like below, but also what ever comes after numbers included over there. Not sure, if I should use str_split, but sometimes those strings are all in together like [[4]] in example.
[[1]]
[1] "20.0" "17.0" "19.0" "20.0"
[[2]]
[1] "12.0" "17.0" "20.0" "14.0"
[[3]]
[1] "15.5" "19.0" "12.5"
[[4]]
[1] "15.0" "17.5" "13.5" "11.5"
data:
test <- c("20.0/Ready Credit 17.0 19.0/Gashaw Boko 20.0", "12.0/Splendid Justine 17.0 20.0/Ranch Pronto 14.0",
"15.5/Norman Price 19.0 12.5", "15.0/Hell Broke Luce17.5/Double Boost 13.5 11.5")
I noticed that your generated pattern would look like the following:
<regex> [\d]+\.[\d]+[/]?[[\w]+]?
I believe the optional tokens should be placed inside parentheses (instead of brackets), as follows:
<regex> [\d]+\.[\d]+(/)?([\w]+)?
Or even simpler:
<regex> [\d]+\.[\d]+(/[\w]+)?
Therefore, as a workaround, I have changed your pattern construction to look like the following:
patrn <- one_or_more(DGT) %R% DOT %R% one_or_more(DGT) %R% "(/" %R% one_or_more(WRD) %R% ")?"
patrn
#<regex> [\d]+\.[\d]+(/[\w]+)?
You may even use this generated pattern directly for your convenience, as follows:
test %>%
str_extract_all(., '[\\d]+\\.[\\d]+(/[\\w]+)?')
Using such pattern, you get the following desired output:
[[1]]
[1] "20.0/Ready" "17.0" "19.0/Gashaw" "20.0"
[[2]]
[1] "12.0/Splendid" "17.0" "20.0/Ranch" "14.0"
[[3]]
[1] "15.5/Norman" "19.0" "12.5"
[[4]]
[1] "15.0/Hell" "17.5/Double" "13.5" "11.5"
Hope it helps.
Is this what you want?
library(stringr)
test <- c("20.0/Ready Credit 17.0 19.0/Gashaw Boko 20.0",
"12.0/Splendid Justine 17.0 20.0/Ranch Pronto 14.0",
"15.5/Norman Price 19.0 12.5",
"15.0/Hell Broke Luce17.5/Double Boost 13.5 11.5")
str_extract_all(test, "[0-9]+\\.[0-9]+[^0-9]*")
[1] "20.0/Ready Credit " "17.0 " "19.0/Gashaw Boko " "20.0"
[1] "12.0/Splendid Justine " "17.0 " "20.0/Ranch Pronto " "14.0"
[1] "15.5/Norman Price " "19.0 " "12.5"
[1] "15.0/Hell Broke Luce" "17.5/Double Boost " "13.5 " "11.5"
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.