提取字符串中由 R 中的特定模式包围的所有数字

Question

I'd like to extract all numbers in a string that are flanked by two markers/patterns.我想提取一个字符串中的所有数字，这些数字两侧有两个标记/模式。 However, regular expressions in R are my bane.但是，R 中的正则表达式是我的祸根。

I have something like this:我有这样的事情：

string  <- "<img src='images/stimuli/32.png' style='width:341.38790035587186px;height: 265px;'><img src='images/stimuli/36.png' style='width:341.38790035587186px;height: 265px;'>"
marker1 <- "images/stimuli/"
marker2 <- ".png"

and want something like this想要这样的东西

gsub(paste0(".*", marker1, "*(.*?) *", marker2, ".*"), "\\1", string)

[1] "32" "36"

However I get this:但是我明白了：

[1] "32"

PS If someone has a good guide to understand how regular expressions work here, please let me know. PS如果有人有一个很好的指南来理解正则表达式是如何在这里工作的，请告诉我。 I am pretty sure that the answer is pretty simple but I just don't get regex:(我很确定答案很简单，但我只是没有得到正则表达式:(

Answer 1

You may use您可以使用

string  <- "<img src='images/stimuli/32.png' style='width:341.38790035587186px;height: 265px;'><img src='images/stimuli/36.png' style='width:341.38790035587186px;height: 265px;'>"
regmatches(string, gregexpr("images/stimuli/\\K\\d+(?=\\.png)", string, perl=TRUE))[[1]]
# => [1] "32" "36"

NOTE : If there can be anything, not just numbers, you may replace \\d+ with .*?注意：如果可以有任何东西，而不仅仅是数字，您可以将\\d+替换为.*? . .

See the R demo and a regex demo .请参阅R 演示和正则表达式演示。

The regmatches with gregexpr extract all matches found in the input.带有regmatches的gregexpr提取在输入中找到的所有匹配项。

The regex matches:正则表达式匹配：

images/stimuli/ - a literal string images/stimuli/ - 文字字符串
\K - a match reset operator discarding all text matched so far \K - 匹配重置运算符，丢弃到目前为止匹配的所有文本
\d+ - 1+ digits \d+ - 1+ 位
(?=\.png) - a .png substring ( . is a special character, it needs escaping). (?=\.png) - .png substring （ .是一个特殊字符，需要转义）。

Answer 2

You can use str_extract from the package stringr :您可以使用str_extract stringr 中的stringr ：

library(stringr)
str_extract_all(string, "(?<=images/stimuli/)\\d+(?=\\.png)")
[[1]]
[1] "32" "36"

This solution uses positive lookbehind, (?<=images/stimuli/) , and positive lookahead, (?=\\.png) , which are both non-capturing groups, and instead matches one or more numbers, \\d+ , sitting between the two.此解决方案使用正向后视(?<=images/stimuli/)和正向前瞻(?=\\.png) ，它们都是非捕获组，而是匹配一个或多个数字\\d+ ，坐在两者之间。

提取字符串中由 R 中的特定模式包围的所有数字

问题描述

2 个解决方案

解决方案1
4 已采纳 2020-05-27 11:14:02

解决方案2
1 2020-05-27 11:25:31

提取字符串中由 R 中的特定模式包围的所有数字

问题描述

2 个解决方案

解决方案1 4 已采纳 2020-05-27 11:14:02

解决方案2 1 2020-05-27 11:25:31

解决方案1
4 已采纳 2020-05-27 11:14:02

解决方案2
1 2020-05-27 11:25:31