[英]Regex in R lookbehind assertion
I'm trying to do some pattern matching with the extract
function from tidyr
. 我正在尝试使用
tidyr
的extract
函数进行一些模式匹配。 I've tested my regex in a regex practice site, the pattern seems to work, and I am using a lookbehind assertion
. 我已经在正则表达式练习站点中测试了我的正则表达式,该模式似乎可行,并且我在使用
lookbehind assertion
。
I have the following sample text: 我有以下示例文本:
=[\"{ Key = source, Values = web,videoTag,assist }\",\"{ Key = type,
Values = attack }\",\"{ Key = team, Values = 2 }\",\"{ Key =
originalStartTimeMs, Values = 56496 }\",\"{ Key = linkId, Values =
1551292895649 }\",\"{ Key = playerJersey, Values = 8 }\",\"{ Key =
attackLocationStartX, Values = 3.9375 }\",\"{ Key =
attackLocationStartY, Values = 0.739376770538243 }\",\"{ Key =
attackLocationStartDeflected, Values = false }\",\"{ Key =
attackLocationEndX, Values = 1.7897727272727275 }\",\"{ Key =
attackLocationEndY, Values = -1.3002832861189795 }\",\"{ Key =
attackLocationEndDeflected, Values = false }\",\"{ Key = lastModified,
Values = web,videoTag,assist
I want to grab the numbers following attackLocationX
(all numbers following any text about an attack location. 我想抓住
attackLocationX
的数字(有关攻击位置的所有文本之后的所有数字。
Using the following code with lookbehind assertion, however, I get no results: 但是,将以下代码与lookbehind断言一起使用时,没有任何结果:
df %>%
extract(message, "x_start",'((?<=attackLocationStartX,/sValues/s=/s)[0-
9.]+)')
This function will return NA
if no pattern match is found, and my target column is all NA
values despite having tested the pattern on www.regexr.com
. 如果未找到任何模式匹配,此函数将返回
NA
,尽管我已经在www.regexr.com
上测试了模式,但我的目标列是所有NA
值。 According to the documentation, R
pattern matching supports lookbehind assertions so I'm not sure what else to do here. 根据文档,
R
模式匹配支持后置断言,因此我不确定在此还可以做什么。
I'm not sure about the lookbehind part, but in R, you need to escape backslashes. 我不确定后面的部分,但是在R中,您需要转义反斜杠。 This isn't obvious if you are using a regex checker that isn't R-specific.
如果您使用的不是R特定的正则表达式检查器,则这并不明显。
So you might want your regex to look something like: 因此,您可能希望您的正则表达式看起来像:
"attackLocationStartX,\\sValues\\s=\\s)[0-9.]+"
First of all, to match whitespace you need \\s
, not /s
. 首先,要匹配空白,您需要
\\s
,而不是/s
。
You do not have to use a lookbehind here, as the extract
will return captured substrings if capturing group(s) are used in the pattern. 您不必在此处使用后退,因为如果模式中使用了捕获组,则
extract
将返回捕获的子字符串。
Use 采用
df %>%
extract(message, "x_start", "attackLocationStartX\\s*,\\s*Values\\s*=\\s*(-?\\d+\\.\\d+)")
Output: 3.9375
. 输出:
3.9375
。
The regex may also look like "attackLocationStartX\\\\s*,\\\\s*Values\\\\s*=\\\\s*(-?\\\\d[.0-9]*)"
. 正则表达式也可能看起来像
"attackLocationStartX\\\\s*,\\\\s*Values\\\\s*=\\\\s*(-?\\\\d[.0-9]*)"
。
As the (-?\\\\d+\\\\.\\\\d+)
part is captured, only the text in this group will be the output. 由于捕获了
(-?\\\\d+\\\\.\\\\d+)
部分,因此只有该组中的文本才是输出。
Pattern details 图案细节
(-?\\d+\\.\\d+)
- a capturing group thst matches (-?\\d+\\.\\d+)
-匹配的捕获组
-?
- an optional hyphen ( ?
means 1 or 0 occurrences ) ?
表示1或0次出现 ) \\d+
- 1 or or digits ( +
means 1 or more ) \\d+
-1或或数字( +
表示1或更多 ) \\.
- a dot \\d+
- 1 or or digits \\d+
-1或或数字 \\d[.0-9]*
- a digit ( \\d
), followed with 0 or more dots or digits ( [.0-9]*
) \\d[.0-9]*
-一个数字( \\d
),后跟0个或多个点或数字( [.0-9]*
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.