Regex in R lookbehind assertion

Question

I'm trying to do some pattern matching with the extract function from tidyr . I've tested my regex in a regex practice site, the pattern seems to work, and I am using a lookbehind assertion .

I have the following sample text:

=[\"{ Key = source, Values = web,videoTag,assist }\",\"{ Key = type, 
Values = attack }\",\"{ Key = team, Values = 2 }\",\"{ Key = 
originalStartTimeMs, Values = 56496 }\",\"{ Key = linkId, Values = 
1551292895649 }\",\"{ Key = playerJersey, Values = 8 }\",\"{ Key = 
attackLocationStartX, Values = 3.9375 }\",\"{ Key = 
attackLocationStartY, Values = 0.739376770538243 }\",\"{ Key = 
attackLocationStartDeflected, Values = false }\",\"{ Key = 
attackLocationEndX, Values = 1.7897727272727275 }\",\"{ Key = 
attackLocationEndY, Values = -1.3002832861189795 }\",\"{ Key = 
attackLocationEndDeflected, Values = false }\",\"{ Key = lastModified, 
Values = web,videoTag,assist

I want to grab the numbers following attackLocationX (all numbers following any text about an attack location.

Using the following code with lookbehind assertion, however, I get no results:

df %>% 
extract(message, "x_start",'((?<=attackLocationStartX,/sValues/s=/s)[0- 
9.]+)')

This function will return NA if no pattern match is found, and my target column is all NA values despite having tested the pattern on www.regexr.com . According to the documentation, R pattern matching supports lookbehind assertions so I'm not sure what else to do here.

Answer 1

I'm not sure about the lookbehind part, but in R, you need to escape backslashes. This isn't obvious if you are using a regex checker that isn't R-specific.

More info here .

So you might want your regex to look something like:

"attackLocationStartX,\\sValues\\s=\\s)[0-9.]+"

Answer 2

First of all, to match whitespace you need \\s , not /s .

You do not have to use a lookbehind here, as the extract will return captured substrings if capturing group(s) are used in the pattern.

Use

df %>% 
  extract(message, "x_start", "attackLocationStartX\\s*,\\s*Values\\s*=\\s*(-?\\d+\\.\\d+)")

Output: 3.9375 .

The regex may also look like "attackLocationStartX\\\\s*,\\\\s*Values\\\\s*=\\\\s*(-?\\\\d[.0-9]*)" .

As the (-?\\\\d+\\\\.\\\\d+) part is captured, only the text in this group will be the output.

Pattern details

(-?\\d+\\.\\d+) - a capturing group thst matches
- -? - an optional hyphen ( ? means 1 or 0 occurrences )
- \\d+ - 1 or or digits ( + means 1 or more )
- \\. - a dot
- \\d+ - 1 or or digits
\\d[.0-9]* - a digit ( \\d ), followed with 0 or more dots or digits ( [.0-9]* )

Regex in R lookbehind assertion

Question

2 answers

solution1
0 2019-03-21 14:14:04

solution2
0 ACCPTED 2019-03-21 14:14:10

Regex in R lookbehind assertion

Question

2 answers

solution1 0 2019-03-21 14:14:04

solution2 0 ACCPTED 2019-03-21 14:14:10

solution1
0 2019-03-21 14:14:04

solution2
0 ACCPTED 2019-03-21 14:14:10