简体   繁体   中英

R subset strings with stringr and rebus

I'm trying to use stringr and rebus to build pattern to subset bunch of strings. Strings I would like to get have something in common, they all start and end with same digit. Example data positions 15 and 22 shows what I would like to subset. Those numbers can vary from 1 to 120.

Here is what I thought would work (I know that I'm not making statement that they have to be same, I don't know how to):

library(stringr)
library(rebus)

pattern <- START %R% one_or_more(DGT) %R% one_or_more(ANY_CHAR) %R% one_or_more(DGT) %R% END

str_subset(example, pattern)

What is correct pattern that I'm looking for? Plus for starting and ending to be exactly same, as that should make it foolproof.

Data:

example <- c("10. - 15. JAN 2017", "COMPETITION ANALYSIS", 
"WOMEN 7.5 KM SPRINT", "CHIEMGAU ARENA", "SAT 14 JAN 2017", "START TIME:", 
"END TIME:", "14:30", "15:47", "Rank Bib Name Nat T", "Loop1 Loop2 Loop3", 
"Result Behind Rank", "Time Behind Rank Time Behind Rank Time Behind Rank", 
"1 43 MAKARAINEN Kaisa FIN 0 20:51.8 0.0 1", "Cumulative Time 7:15.7 0.0 1 14:32.2 0.0 1 20:51.8 0.0 1", 
"Loop Time 7:15.7 0.0 1 7:16.5 0.0 1 6:19.6 0.0 1", "Shooting 0 33.0 +12.0 =41 0 30.0 +8.0 =42 0 1:03.0 +19.0 =48", 
"Range Time 55.5 +11.9 =35 51.9 +7.5 37 1:47.4 +18.5 38", "Course Time 6:14.5 0.0 1 6:19.9 0.0 1 6:19.6 0.0 1 18:54.0 0.0 1", 
"Penalty Time 5.7 4.7 10.4", "2 64 KOUKALOVA Gabriela CZE 0 21:13.8 +22.0 2", 
"Cumulative Time 7:24.6 +8.9 3 14:45.4 +13.2 2 21:13.8 +22.0 2"
)

If it does not have to be rebus and stringr you might use grep with regex (regular expression) as shown below. Does that help?

example[grepl("(^)(\\d+)(.+)(\\d+)($)", example, perl = T)]
# [1] "1 30 HORCHLER Nadine GER 0 36:11.5 0.0 1" 
# [2] "2 1 DAHLMEIER Laura GER 3 36:14.6 +3.1 2"

You may also restrict the last capturing group (\\\\d+) to the specified range of numbers from 1 to 120 by replacing it by ([1-120]) .

I see I am answering this quite late, and Im not sure how that is received here, but as Manuel pointed out, what you want to use is a capture group, as he showed in regex. If you are committed however, to using rebus, all you need to do is add in a capture function, and a reference:

START %R% capture(one_or_more(DGT)) %R% one_or_more(ANY_CHAR) %R% REF1 %R% END

As you can see, capture() is grabbing the 1 or more digits, and expecting them to come after some number of characters, immediately followed by the end. I hope I could help someone, this is my first answer on this website.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM