简体   繁体   中英

Regex in R lookbehind assertion

I'm trying to do some pattern matching with the extract function from tidyr . I've tested my regex in a regex practice site, the pattern seems to work, and I am using a lookbehind assertion .

I have the following sample text:

=[\"{ Key = source, Values = web,videoTag,assist }\",\"{ Key = type, 
Values = attack }\",\"{ Key = team, Values = 2 }\",\"{ Key = 
originalStartTimeMs, Values = 56496 }\",\"{ Key = linkId, Values = 
1551292895649 }\",\"{ Key = playerJersey, Values = 8 }\",\"{ Key = 
attackLocationStartX, Values = 3.9375 }\",\"{ Key = 
attackLocationStartY, Values = 0.739376770538243 }\",\"{ Key = 
attackLocationStartDeflected, Values = false }\",\"{ Key = 
attackLocationEndX, Values = 1.7897727272727275 }\",\"{ Key = 
attackLocationEndY, Values = -1.3002832861189795 }\",\"{ Key = 
attackLocationEndDeflected, Values = false }\",\"{ Key = lastModified, 
Values = web,videoTag,assist 

I want to grab the numbers following attackLocationX (all numbers following any text about an attack location.

Using the following code with lookbehind assertion, however, I get no results:

df %>% 
extract(message, "x_start",'((?<=attackLocationStartX,/sValues/s=/s)[0- 
9.]+)')

This function will return NA if no pattern match is found, and my target column is all NA values despite having tested the pattern on www.regexr.com . According to the documentation, R pattern matching supports lookbehind assertions so I'm not sure what else to do here.

I'm not sure about the lookbehind part, but in R, you need to escape backslashes. This isn't obvious if you are using a regex checker that isn't R-specific.

More info here .

So you might want your regex to look something like:

"attackLocationStartX,\\sValues\\s=\\s)[0-9.]+"

First of all, to match whitespace you need \\s , not /s .

You do not have to use a lookbehind here, as the extract will return captured substrings if capturing group(s) are used in the pattern.

Use

df %>% 
  extract(message, "x_start", "attackLocationStartX\\s*,\\s*Values\\s*=\\s*(-?\\d+\\.\\d+)")

Output: 3.9375 .

The regex may also look like "attackLocationStartX\\\\s*,\\\\s*Values\\\\s*=\\\\s*(-?\\\\d[.0-9]*)" .

As the (-?\\\\d+\\\\.\\\\d+) part is captured, only the text in this group will be the output.

Pattern details

  • (-?\\d+\\.\\d+) - a capturing group thst matches
    • -? - an optional hyphen ( ? means 1 or 0 occurrences )
    • \\d+ - 1 or or digits ( + means 1 or more )
    • \\. - a dot
    • \\d+ - 1 or or digits
  • \\d[.0-9]* - a digit ( \\d ), followed with 0 or more dots or digits ( [.0-9]* )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM