简体   繁体   English

正则表达式获取 Javascript 中两个字符串之间的字符串

[英]Regular expression to get a string between two strings in Javascript

I have found very similar posts, but I can't quite get my regular expression right here.我发现了非常相似的帖子,但我不能在这里得到我的正则表达式。

I am trying to write a regular expression which returns a string which is between two other strings.我正在尝试编写一个正则表达式,它返回一个位于其他两个字符串之间的字符串。 For example: I want to get the string which resides between the strings "cow" and "milk".例如:我想获取位于字符串“cow”和“milk”之间的字符串。

My cow always gives milk我的奶牛总是喂奶

would return会回来

"always gives" “总是给”

Here is the expression I have pieced together so far:这是我到目前为止拼凑的表达式:

(?=cow).*(?=milk)

However, this returns the string "cow always gives".但是,这会返回字符串“cow always give”。

A lookahead (that (?= part) does not consume any input. It is a zero-width assertion (as are boundary checks and lookbehinds).前瞻(即(?= part) 不消耗任何输入。它是一个零宽度断言(边界检查和后视)。

You want a regular match here, to consume the cow portion.您想要在这里进行常规比赛,以消耗cow部分。 To capture the portion in between, you use a capturing group (just put the portion of pattern you want to capture inside parenthesis):要捕获两者之间的部分,请使用捕获组(只需将要捕获的模式部分放在括号内):

cow(.*)milk

No lookaheads are needed at all.根本不需要前瞻。

Regular expression to get a string between two strings in JavaScript JavaScript 中获取两个字符串之间的字符串的正则表达式

The most complete solution that will work in the vast majority of cases is using a capturing group with a lazy dot matching pattern .适用于绝大多数情况的最完整的解决方案是使用具有惰性点匹配模式捕获组 However, a dot .然而,一个点. in JavaScript regex does not match line break characters, so, what will work in 100% cases is a [^] or [\\s\\S] / [\\d\\D] / [\\w\\W] constructs.在 JavaScript 正则表达式中不匹配换行符,因此,在 100% 情况下有效的是[^][\\s\\S] / [\\d\\D] / [\\w\\W]构造。

ECMAScript 2018 and newer compatible solution ECMAScript 2018 和更新的兼容解决方案

In JavaScript environments supporting ECMAScript 2018 , s modifier allows .在支持ECMAScript 2018 的JavaScript 环境中, s修饰符允许. to match any char including line break chars, and the regex engine supports lookbehinds of variable length.匹配任何字符,包括换行符,并且正则表达式引擎支持可变长度的lookbehinds。 So, you may use a regex like所以,你可以使用像这样的正则表达式

var result = s.match(/(?<=cow\s+).*?(?=\s+milk)/gs); // Returns multiple matches if any
// Or
var result = s.match(/(?<=cow\s*).*?(?=\s*milk)/gs); // Same but whitespaces are optional

In both cases, the current position is checked for cow with any 1/0 or more whitespaces after cow , then any 0+ chars as few as possible are matched and consumed (=added to the match value), and then milk is checked for (with any 1/0 or more whitespaces before this substring).在这两种情况下,当前位置被检查cow与后任何1/0或多个空格cow ,那么任何0+字符尽可能少匹配和消耗(=加入到匹配值),然后milk中检查(在此子字符串之前有任何 1/0 或更多空格)。

Scenario 1: Single-line input场景一:单线输入

This and all other scenarios below are supported by all JavaScript environments.所有 JavaScript 环境都支持此方案和以下所有其他方案。 See usage examples at the bottom of the answer.请参阅答案底部的使用示例。

cow (.*?) milk

cow is found first, then a space, then any 0+ chars other than line break chars, as few as possible as *?首先找到cow ,然后是一个空格,然后是除换行符以外的任何 0+ 字符,尽可能少*? is a lazy quantifier, are captured into Group 1 and then a space with milk must follow (and those are matched and consumed , too).是一个惰性量词,被捕获到组 1 中,然后必须跟随一个带有milk的空间(并且那些也被匹配和消耗)。

Scenario 2: Multiline input场景 2:多行输入

cow ([\s\S]*?) milk

Here, cow and a space are matched first, then any 0+ chars as few as possible are matched and captured into Group 1, and then a space with milk are matched.在这里,首先匹配cow和一个空格,然后匹配尽可能少的任何 0+ 个字符并捕获到组 1,然后匹配一个带有milk的空格。

Scenario 3: Overlapping matches场景 3:重叠匹配

If you have a string like >>>15 text>>>67 text2>>> and you need to get 2 matches in-between >>> + number + whitespace and >>> , you can't use />>>\\d+\\s(.*?)>>>/g as this will only find 1 match due to the fact the >>> before 67 is already consumed upon finding the first match.如果您有像>>>15 text>>>67 text2>>>这样的字符串,并且您需要在>>> + number + whitespace>>>之间获得 2 个匹配项,则不能使用/>>>\\d+\\s(.*?)>>>/g因为这只会找到 1 个匹配项,因为在找到第一个匹配项时已经消耗67之前的>>> You may use a positive lookahead to check for the text presence without actually "gobbling" it (ie appending to the match):您可以使用正向前瞻来检查文本是否存在,而无需实际“吞噬”它(即附加到匹配项):

/>>>\d+\s(.*?)(?=>>>)/g

See the online regex demo yielding text1 and text2 as Group 1 contents found.请参阅在线正则表达式演示,将text1text2作为找到的组 1 内容。

Also see How to get all possible overlapping matches for a string .另请参阅如何获取字符串的所有可能重叠匹配项

Performance considerations性能注意事项

Lazy dot matching pattern ( .*? ) inside regex patterns may slow down script execution if very long input is given.如果给出很长的输入,则正则表达式模式中的惰性点匹配模式 ( .*? ) 可能会减慢脚本的执行速度。 In many cases, unroll-the-loop technique helps to a greater extent.在许多情况下, 展开循环技术在更大程度上有帮助。 Trying to grab all between cow and milk from "Their\\ncow\\ngives\\nmore\\nmilk" , we see that we just need to match all lines that do not start with milk , thus, instead of cow\\n([\\s\\S]*?)\\nmilk we can use:试图从"Their\\ncow\\ngives\\nmore\\nmilk"获取cowmilk之间的所有内容,我们看到我们只需要匹配所有不以milk开头的行,因此,而不是cow\\n([\\s\\S]*?)\\nmilk我们可以使用:

/cow\n(.*(?:\n(?!milk$).*)*)\nmilk/gm

See the regex demo (if there can be \\r\\n , use /cow\\r?\\n(.*(?:\\r?\\n(?!milk$).*)*)\\r?\\nmilk/gm ).查看正则表达式演示(如果可以有\\r\\n ,请使用/cow\\r?\\n(.*(?:\\r?\\n(?!milk$).*)*)\\r?\\nmilk/gm )。 With this small test string, the performance gain is negligible, but with very large text, you will feel the difference (especially if the lines are long and line breaks are not very numerous).使用这个小的测试字符串,性能提升可以忽略不计,但是对于非常大的文本,您会感觉到差异(尤其是在行很长且换行不是很多的情况下)。

Sample regex usage in JavaScript: JavaScript 中的示例正则表达式用法:

 //Single/First match expected: use no global modifier and access match[1] console.log("My cow always gives milk".match(/cow (.*?) milk/)[1]); // Multiple matches: get multiple matches with a global modifier and // trim the results if length of leading/trailing delimiters is known var s = "My cow always gives milk, thier cow also gives milk"; console.log(s.match(/cow (.*?) milk/g).map(function(x) {return x.substr(4,x.length-9);})); //or use RegExp#exec inside a loop to collect all the Group 1 contents var result = [], m, rx = /cow (.*?) milk/g; while ((m=rx.exec(s)) !== null) { result.push(m[1]); } console.log(result);

Using the modern String#matchAll method使用现代String#matchAll方法

 const s = "My cow always gives milk, thier cow also gives milk"; const matches = s.matchAll(/cow (.*?) milk/g); console.log(Array.from(matches, x => x[1]));

Here's a regex which will grab what's between cow and milk (without leading/trailing space):这是一个正则表达式,它将获取牛和牛奶之间的内容(没有前导/尾随空间):

srctext = "My cow always gives milk.";
var re = /(.*cow\s+)(.*)(\s+milk.*)/;
var newtext = srctext.replace(re, "$2");

An example: http://jsfiddle.net/entropo/tkP74/一个例子: http : //jsfiddle.net/entropo/tkP74/

  • You need capture the .*您需要捕获.*
  • You can (but don't have to) make the .* nongreedy您可以(但不必)使.*贪婪
  • There's really no need for the lookahead.真的不需要前瞻。

     > /cow(.*?)milk/i.exec('My cow always gives milk'); ["cow always gives milk", " always gives "]

The chosen answer didn't work for me...hmm...选择的答案对我不起作用......嗯......

Just add space after cow and/or before milk to trim spaces from " always gives "只需在牛和/或牛奶之前添加空格以修剪“总是给出”中的空格

/(?<=cow ).*(?= milk)/

在此处输入图片说明

I was able to get what I needed using Martinho Fernandes' solution below.使用下面的 Martinho Fernandes 解决方案,我能够得到我需要的东西。 The code is:代码是:

var test = "My cow always gives milk";

var testRE = test.match("cow(.*)milk");
alert(testRE[1]);

You'll notice that I am alerting the testRE variable as an array.您会注意到我将 testRE 变量作为数组发出警报。 This is because testRE is returning as an array, for some reason.这是因为出于某种原因,testRE 作为数组返回。 The output from:输出来自:

My cow always gives milk

Changes into:更改为:

always gives

I find regex to be tedious and time consuming given the syntax.考虑到语法,我发现正则表达式既乏味又耗时。 Since you are already using javascript it is easier to do the following without regex:由于您已经在使用 javascript,因此在没有正则表达式的情况下更容易执行以下操作:

const text = 'My cow always gives milk'
const start = `cow`;
const end = `milk`;
const middleText = text.split(start)[1].split(end)[0]
console.log(middleText) // prints "always gives"

只需使用以下正则表达式:

(?<=My cow\s).*?(?=\smilk)

If the data is on multiple lines then you may have to use the following,如果数据在多行上,那么您可能必须使用以下内容,

/My cow ([\s\S]*)milk/gm

My cow always gives 
milk

Regex 101 example正则表达式 101 示例

You can use the method match() to extract a substring between two strings.您可以使用match()方法来提取两个字符串之间的子字符串。 Try the following code:试试下面的代码:

var str = "My cow always gives milk";
var subStr = str.match("cow(.*)milk");
console.log(subStr[1]);

Output:输出:

always gives总是给

See a complete example here : How to find sub-string between two strings .在此处查看完整示例: 如何在两个字符串之间查找子字符串

The method match() searches a string for a match and returns an Array object.方法 match() 在字符串中搜索匹配项并返回一个 Array 对象。

// Original string
var str = "My cow always gives milk";

// Using index [0] would return<br/>
// "**cow always gives milk**"
str.match(/cow(.*)milk/)**[0]**


// Using index **[1]** would return
// "**always gives**"
str.match(/cow(.*)milk/)[1]

Task任务

Extract substring between two string (excluding this two strings)提取两个字符串之间的子字符串(不包括这两个字符串)

Solution解决方案

let allText = "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum";
let textBefore = "five centuries,";
let textAfter = "electronic typesetting";
var regExp = new RegExp(`(?<=${textBefore}\\s)(.+?)(?=\\s+${textAfter})`, "g");
var results = regExp.exec(allText);
if (results && results.length > 1) {
    console.log(results[0]);
}

You can use destructuring to only focus on the part of your interest.您可以使用解构来只关注您感兴趣的部分。

So you can do:所以你可以这样做:

 let str = "My cow always gives milk"; let [, result] = str.match(/\\bcow\\s+(.*?)\\s+milk\\b/) || []; console.log(result);

In this way you ignore the first part (the complete match) and only get the capture group's match.通过这种方式,您可以忽略第一部分(完整匹配)而只获得捕获组的匹配。 The addition of || [] || [] || [] may be interesting if you are not sure there will be a match at all. || []如果您不确定是否会匹配,可能会很有趣。 In that case match would return null which cannot be destructured, and so we return [] instead in that case, and then result will be null .在这种情况下match将返回无法解构的null ,因此在这种情况下我们返回[] ,然后result将为null

The additional \\b ensures the surrounding words "cow" and "milk" are really separate words (eg not "milky").附加的\\b确保周围的单词“cow”和“milk”是真正独立的单词(例如不是“milky”)。 Also \\s+ is needed to avoid that the match includes some outer spacing.还需要\\s+以避免匹配包含一些外部间距。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM