[英]Extracting all numbers in a string that are surrounded by a certain pattern in R
I'd like to extract all numbers in a string that are flanked by two markers/patterns.我想提取一个字符串中的所有数字,这些数字两侧有两个标记/模式。 However, regular expressions in R are my bane.但是,R 中的正则表达式是我的祸根。
I have something like this:我有这样的事情:
string <- "<img src='images/stimuli/32.png' style='width:341.38790035587186px;height: 265px;'><img src='images/stimuli/36.png' style='width:341.38790035587186px;height: 265px;'>"
marker1 <- "images/stimuli/"
marker2 <- ".png"
and want something like this想要这样的东西
gsub(paste0(".*", marker1, "*(.*?) *", marker2, ".*"), "\\1", string)
[1] "32" "36"
However I get this:但是我明白了:
[1] "32"
PS If someone has a good guide to understand how regular expressions work here, please let me know. PS如果有人有一个很好的指南来理解正则表达式是如何在这里工作的,请告诉我。 I am pretty sure that the answer is pretty simple but I just don't get regex:(我很确定答案很简单,但我只是没有得到正则表达式:(
You may use您可以使用
string <- "<img src='images/stimuli/32.png' style='width:341.38790035587186px;height: 265px;'><img src='images/stimuli/36.png' style='width:341.38790035587186px;height: 265px;'>"
regmatches(string, gregexpr("images/stimuli/\\K\\d+(?=\\.png)", string, perl=TRUE))[[1]]
# => [1] "32" "36"
NOTE : If there can be anything, not just numbers, you may replace \\d+
with .*?
注意:如果可以有任何东西,而不仅仅是数字,您可以将\\d+
替换为.*?
. .
See the R demo and a regex demo .请参阅R 演示和正则表达式演示。
The regmatches
with gregexpr
extract all matches found in the input.带有regmatches
的gregexpr
提取在输入中找到的所有匹配项。
The regex matches:正则表达式匹配:
images/stimuli/
- a literal string images/stimuli/
- 文字字符串\K
- a match reset operator discarding all text matched so far \K
- 匹配重置运算符,丢弃到目前为止匹配的所有文本\d+
- 1+ digits \d+
- 1+ 位(?=\.png)
- a .png
substring ( .
is a special character, it needs escaping). (?=\.png)
- .png
substring ( .
是一个特殊字符,需要转义)。You can use str_extract
from the package stringr
:您可以使用str_extract
stringr 中的stringr
:
library(stringr)
str_extract_all(string, "(?<=images/stimuli/)\\d+(?=\\.png)")
[[1]]
[1] "32" "36"
This solution uses positive lookbehind, (?<=images/stimuli/)
, and positive lookahead, (?=\\.png)
, which are both non-capturing groups, and instead matches one or more numbers, \\d+
, sitting between the two.此解决方案使用正向后视(?<=images/stimuli/)
和正向前瞻(?=\\.png)
,它们都是非捕获组,而是匹配一个或多个数字\\d+
,坐在两者之间。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.