简体   繁体   English

提取字符串中由 R 中的特定模式包围的所有数字

[英]Extracting all numbers in a string that are surrounded by a certain pattern in R

I'd like to extract all numbers in a string that are flanked by two markers/patterns.我想提取一个字符串中的所有数字,这些数字两侧有两个标记/模式。 However, regular expressions in R are my bane.但是,R 中的正则表达式是我的祸根。

I have something like this:我有这样的事情:

string  <- "<img src='images/stimuli/32.png' style='width:341.38790035587186px;height: 265px;'><img src='images/stimuli/36.png' style='width:341.38790035587186px;height: 265px;'>"
marker1 <- "images/stimuli/"
marker2 <- ".png"

and want something like this想要这样的东西

gsub(paste0(".*", marker1, "*(.*?) *", marker2, ".*"), "\\1", string)

[1] "32" "36"

However I get this:但是我明白了:

[1] "32"

PS If someone has a good guide to understand how regular expressions work here, please let me know. PS如果有人有一个很好的指南来理解正则表达式是如何在这里工作的,请告诉我。 I am pretty sure that the answer is pretty simple but I just don't get regex:(我很确定答案很简单,但我只是没有得到正则表达式:(

You may use您可以使用

string  <- "<img src='images/stimuli/32.png' style='width:341.38790035587186px;height: 265px;'><img src='images/stimuli/36.png' style='width:341.38790035587186px;height: 265px;'>"
regmatches(string, gregexpr("images/stimuli/\\K\\d+(?=\\.png)", string, perl=TRUE))[[1]]
# => [1] "32" "36"

NOTE : If there can be anything, not just numbers, you may replace \\d+ with .*?注意:如果可以有任何东西,而不仅仅是数字,您可以将\\d+替换为.*? . .

See the R demo and a regex demo .请参阅R 演示正则表达式演示

The regmatches with gregexpr extract all matches found in the input.带有regmatchesgregexpr提取在输入中找到的所有匹配项。

The regex matches:正则表达式匹配:

  • images/stimuli/ - a literal string images/stimuli/ - 文字字符串
  • \K - a match reset operator discarding all text matched so far \K - 匹配重置运算符,丢弃到目前为止匹配的所有文本
  • \d+ - 1+ digits \d+ - 1+ 位
  • (?=\.png) - a .png substring ( . is a special character, it needs escaping). (?=\.png) - .png substring ( .是一个特殊字符,需要转义)。

You can use str_extract from the package stringr :您可以使用str_extract stringr 中的stringr

library(stringr)
str_extract_all(string, "(?<=images/stimuli/)\\d+(?=\\.png)")
[[1]]
[1] "32" "36"

This solution uses positive lookbehind, (?<=images/stimuli/) , and positive lookahead, (?=\\.png) , which are both non-capturing groups, and instead matches one or more numbers, \\d+ , sitting between the two.此解决方案使用正向后视(?<=images/stimuli/)和正向前瞻(?=\\.png) ,它们都是非捕获组,而是匹配一个或多个数字\\d+ ,坐在两者之间。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM