简体   繁体   中英

Convert python search regex to ruby regex

I'm trying to convert the following python regex to ruby

match = re.search(r'window.__APOLLO_STATE__ = JSON.parse\("(.+?)"\);', body)

I've done some digging and Regexp#match should be what i'm looking for but the following is returning nil.

resp.body.match('^window.__APOLLO_STATE__ = JSON.parse\("(.+?)"\)')

How can I convert the regex and where am I wrong?

You may use

resp.body[/window\.__APOLLO_STATE__ = JSON\.parse\("(.*?)"\);/, 1]

Here,

  • /.../ is a regex literal notation that is very convenient when defining regex patterns
  • Literal dots are escaped, else, they match any char but line break chars
  • The .+? is changed to .*? to be able to match empty values (else, you may overmatch, it is easier to later discard empty matches than fix overmatches)
  • 1 tells the engine to return the value of the capturing group with ID 2 of the first match. If you need multiple matches, use resp.body.scan(/regex/) .

An idiomatic way is to use the =~ regex match operator:

resp.body =~ /^window.__APOLLO_STATE__ = JSON.parse\("(.+?)"\)/

You can access the capture groups with $1 , $2 , and so on.

If you don't like the global variable usage, you can also use the Regexp#match method

result = /^window.__APOLLO_STATE__ = JSON.parse\("(.+?)"\)/.match(resp.body)
result[1] # => returns first capture group

As I understand, your string is something like

str = 'window.__APOLLO_STATE__ = JSON.parse("my dog has fleas");'

and you wish to extract the text between the double quotes. You can do that with the following regular expression, which does not employ a capture group:

r = /\Awindow\.__APOLLO_STATE__ = JSON\.parse\(\"\K.+?(?=\"\);\z)/

str[r]
  #=> "my dog has fleas"

The regular expression can be written in free-spacing mode to make it self-documenting:

r = /
    \A          # match beginning of string
    window\.__APOLLO_STATE__\ =\ JSON\.parse\(\"
                # match substring
    \K          # discard everything matched so far 
    .+?         # match 1+ characters, lazily
    (?=\"\);\z) # match "); followed by end-of-string (positive lookahead)
    /x          # free-spacing regex definition mode

The contents of a positive lookahead must be matched but are not part of the match returned. Neither is the text matched prior to the \\K directive part of the match returned.

Free-spacing mode removes all whitespace before the expression is parsed. Accordingly, any intended spaces (in "APOLLO_STATE__ = JSON" , for example) must be protected. I've done that by escaping the spaces, one of several ways that can be done.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM