简体   繁体   中英

Replicating regex expression using Rebus package in R

I would like to create a pattern for the following text string using the rebus package in R.

My attempts are below, but I am not able to remove the square brackets and return the same pattern using str_view() . Is there perhaps a tool / function that can replicate regex expressions using the rebus package? Rebus is a lot easier to read and makes sense when sharing code with someone that might not be familiar with regex.

Pattern with regex:

pattern = "http.*for-sale.*5857"

I am trying to replicate this with the rebus package:

pattern_rebus = "http" %R% zero_or_more(ANY_CHAR) %R% "for-sale" %R% zero_or_more(ANY_CHAR) %R% "5857"

as.regex(pattern_rebus)
<regex> http[.]*for-sale[.]*5857

There is a bug in rebus , it wraps all the repeated ( one_or_more or zero_or_more ) chars with [ and ] , a character class. That is why .* should be added manually.

pattern_rebus = "http" %R% ".*" %R% "for-sale" %R% ".*5857"
as.regex(pattern_rebus)
## => <regex> http.*for-sale.*5857

However, you may use a workaround, [\s\S] instead of a . will match any chars if you use a PCRE regex (with base R regex functions) or ICU regex (with stringr regex functions):

pattern_rebus = "http" %R% zero_or_more(char_class(WRD, NOT_WRD)) %R% "for-sale" %R% zero_or_more(char_class(WRD, NOT_WRD)) %R% "5857"
as.regex(pattern_rebus)
## => <regex> http[\w\W]*for-sale[\w\W]*5857

Or, if you want to match any char but CR and LF:

pattern_rebus = "http" %R% zero_or_more(negated_char_class("\\r\\n")) %R% "for-sale" %R% zero_or_more(negated_char_class("\\r\\n")) %R% "5857"
as.regex(pattern_rebus)
## => <regex> http[^\r\n]*for-sale[^\r\n]*5857

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM