简体   繁体   English

使用正则表达式从R中的字符串中提取数字

[英]Extracting numbers from string in R using regex

I have a string like this: 我有一个像这样的字符串:

myString <- "[0.15][4577896]blahblahblahblahwhatever"

I need to extract the number between second brackets. 我需要提取第二个括号之间的数字。

Currently I am trying to use this: 目前,我正在尝试使用此功能:

str_extract(myString, "\\]\\[(\\d+)")

But this gives me ][4577896 但这给了我][4577896

My desired result would be: 4577896 我想要的结果是: 4577896

How could I achieve this? 我怎样才能做到这一点?

With no need of look behinds 无需回头

gsub(".*\\[(\\d+).*","\\1",myString)
[1] "4577896"

You can try this . 你可以试试看。 (?<=\\]\\[)(\\d+)

This is a demo. 这是一个演示。 https://regex101.com/r/fvHW05/1 https://regex101.com/r/fvHW05/1

Here is another version with minimal or no regex 这是带有最小或没有正则表达式的另一个版本

qdapRegex::ex_between_multiple(myString, "[", "]")[[2]]
#[1] "4577896"

It extracts all the substring between [ and ] and we select the value between second bracket. 它提取[]之间的所有子字符串,然后选择第二个括号之间的值。 You can convert it into numeric or integer if needed. 您可以根据需要将其转换为数字或整数。

You may use 您可以使用

^(?:[^\[\]]*\[[^\[\]]+\])[^\]\[]*\[([^\]\[]+).+

And replace this with the first captured group using gsub , see a demo on regex101.com . 并使用gsub将其替换为第一个捕获的组,请参见regex101.com上的演示 In base R : 在基数R

myString <- "[0.15][4577896]blahblahblahblahwhatever"

pattern <- "^(?:[^\\[\\]]*\\[[^\\[\\]]+\\])[^\\]\\[]*\\[([^\\]\\[]+).+"
gsub(pattern, "\\1", myString, perl = T)
# [1] "4577896"

An option using str_extract 使用str_extract的选项

library(stringr)
str_extract(myString, "(?<=.\\[)([0-9]+)")
#[1] "4577896"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM