R背后的正则表达式断言

Question

I'm trying to do some pattern matching with the extract function from tidyr . 我正在尝试使用tidyr的extract函数进行一些模式匹配。 I've tested my regex in a regex practice site, the pattern seems to work, and I am using a lookbehind assertion . 我已经在正则表达式练习站点中测试了我的正则表达式，该模式似乎可行，并且我在使用lookbehind assertion 。

I have the following sample text: 我有以下示例文本：

=[\"{ Key = source, Values = web,videoTag,assist }\",\"{ Key = type, 
Values = attack }\",\"{ Key = team, Values = 2 }\",\"{ Key = 
originalStartTimeMs, Values = 56496 }\",\"{ Key = linkId, Values = 
1551292895649 }\",\"{ Key = playerJersey, Values = 8 }\",\"{ Key = 
attackLocationStartX, Values = 3.9375 }\",\"{ Key = 
attackLocationStartY, Values = 0.739376770538243 }\",\"{ Key = 
attackLocationStartDeflected, Values = false }\",\"{ Key = 
attackLocationEndX, Values = 1.7897727272727275 }\",\"{ Key = 
attackLocationEndY, Values = -1.3002832861189795 }\",\"{ Key = 
attackLocationEndDeflected, Values = false }\",\"{ Key = lastModified, 
Values = web,videoTag,assist

I want to grab the numbers following attackLocationX (all numbers following any text about an attack location. 我想抓住attackLocationX的数字（有关攻击位置的所有文本之后的所有数字。

Using the following code with lookbehind assertion, however, I get no results: 但是，将以下代码与lookbehind断言一起使用时，没有任何结果：

df %>% 
extract(message, "x_start",'((?<=attackLocationStartX,/sValues/s=/s)[0- 
9.]+)')

This function will return NA if no pattern match is found, and my target column is all NA values despite having tested the pattern on www.regexr.com . 如果未找到任何模式匹配，此函数将返回NA ，尽管我已经在www.regexr.com上测试了模式，但我的目标列是所有NA值。 According to the documentation, R pattern matching supports lookbehind assertions so I'm not sure what else to do here. 根据文档， R模式匹配支持后置断言，因此我不确定在此还可以做什么。

Answer 1

I'm not sure about the lookbehind part, but in R, you need to escape backslashes. 我不确定后面的部分，但是在R中，您需要转义反斜杠。 This isn't obvious if you are using a regex checker that isn't R-specific. 如果您使用的不是R特定的正则表达式检查器，则这并不明显。

More info here . 更多信息在这里。

So you might want your regex to look something like: 因此，您可能希望您的正则表达式看起来像：

"attackLocationStartX,\\sValues\\s=\\s)[0-9.]+"

Answer 2

First of all, to match whitespace you need \\s , not /s . 首先，要匹配空白，您需要\\s ，而不是/s 。

You do not have to use a lookbehind here, as the extract will return captured substrings if capturing group(s) are used in the pattern. 您不必在此处使用后退，因为如果模式中使用了捕获组，则extract将返回捕获的子字符串。

Use 采用

df %>% 
  extract(message, "x_start", "attackLocationStartX\\s*,\\s*Values\\s*=\\s*(-?\\d+\\.\\d+)")

Output: 3.9375 . 输出： 3.9375 。

The regex may also look like "attackLocationStartX\\\\s*,\\\\s*Values\\\\s*=\\\\s*(-?\\\\d[.0-9]*)" . 正则表达式也可能看起来像"attackLocationStartX\\\\s*,\\\\s*Values\\\\s*=\\\\s*(-?\\\\d[.0-9]*)" 。

As the (-?\\\\d+\\\\.\\\\d+) part is captured, only the text in this group will be the output. 由于捕获了(-?\\\\d+\\\\.\\\\d+)部分，因此只有该组中的文本才是输出。

Pattern details 图案细节

(-?\\d+\\.\\d+) - a capturing group thst matches (-?\\d+\\.\\d+) -匹配的捕获组
- -? - an optional hyphen ( ? means 1 or 0 occurrences ) -可选的连字符（ ?表示1或0次出现 ）
- \\d+ - 1 or or digits ( + means 1 or more ) \\d+ -1或或数字（ +表示1或更多 ）
- \\. - a dot -一个点
- \\d+ - 1 or or digits \\d+ -1或或数字
\\d[.0-9]* - a digit ( \\d ), followed with 0 or more dots or digits ( [.0-9]* ) \\d[.0-9]* -一个数字（ \\d ），后跟0个或多个点或数字（ [.0-9]* ）

R背后的正则表达式断言

问题描述

2 个解决方案

解决方案1
0 2019-03-21 14:14:04

解决方案2
0 已采纳 2019-03-21 14:14:10

R背后的正则表达式断言

问题描述

2 个解决方案

解决方案1 0 2019-03-21 14:14:04

解决方案2 0 已采纳 2019-03-21 14:14:10

解决方案1
0 2019-03-21 14:14:04

解决方案2
0 已采纳 2019-03-21 14:14:10