简体   繁体   中英

Positive look behind in JavaScript regular expression

I've a document from which I need to extract some data. Document contain strings like these

Text:"How secure is my information?"

I need to extract text which is in double quotes after the literal Text:

How secure is my information?

How do I do this with regex in Javascript

Lookbehind assertions were recently finalised for JavaScript and will be in the next publication of the ECMA-262 specification. They are supported in Chrome 66 (Opera 53), but no other major browsers at the time of writing ( caniuse ).

var str = 'Text:"How secure is my information?"',
    reg = /(?<=Text:")[^"]+(?=")/;

str.match(reg)[0];
// -> How secure is my information?

Older browsers do not support lookbehind in JavaScript regular expression. You have to use capturing parenthesis for expressions like this one instead:

var str = 'Text:"How secure is my information?"',
    reg = /Text:"([^"]+)"/;

str.match(reg)[1];
// -> How secure is my information?

This will not cover all the lookbehind assertion use cases, however.

I just want to add something: JavaScript doesn't support lookbehinds like (?<= ) or (?<! ) .

But it does support lookaheads like (?= ) or (?! ) .

You can just do:

/Text:"(.*?)"/

Explanation:

  • Text:" : To be matched literally
  • .*? : To match anything in non-greedy way
  • () : To capture the match
  • " : To match a literal "
  • / / : delimiters

If you want to avoid the regular expression all together you can do:

var texts = file.split('Text:"').slice(1).map(function (text) {
  return text.slice(0, text.lastIndexOf('"')); 
});
string.match(/Text:"([^"]*)"/g)
<script type="text/javascript">
var str = 'Text:"How secure is my information?"';
var obj = eval('({'+str+'})')
console.log(obj.Text);
</script>

Here is an example showing how you can approach this.

1) Given this input string:

const inputText = 
`Text:"How secure is my information?"someRandomTextHere
Voice:"Not very much"
Text:"How to improve this?"
Voice:"Don't use '123456' for your password"
Text:"OK just like in the "Hackers" movie."`;

2) Extract data in double quotes after the literal Text: so that the results is an array with all matches like so:

["How secure is my information?",
 "How to improve this?",
 "OK just like in the \"Hackers\" movie."]

SOLUTION

function getText(text) {
  return text
    .match(/Text:".*"/g)
    .map(item => item.match(/^Text:"(.*)"/)[1]);
}

console.log(JSON.stringify(    getText(inputText)    ));

RUN SNIPPET TO SEE A WORKING DEMO

 const inputText = `Text:"How secure is my information?"someRandomTextHere Voice:"Not very much" Text:"How to improve this?" Voice:"Don't use '123456' for your password" Text:"OK just like in the "Hackers" movie."`; function getText(text) { return text .match(/Text:".*"/g) .map(item => item.match(/^Text:"(.*)"/)[1]); } console.log(JSON.stringify( getText(inputText) ));

If you, like me, get here while researching a bug related to the Cloudinary gem, you may find this useful:

Cloudinary recently released version 1.16.0 of their gem. In Safari, this crashes with the error 'Invalid regular expression: invalid group specifier name'.

A bug report has been filed. In the meantime I reverted to 1.15.0 and the error went away.

Hope this saves someone some lifetime.

A regular expression with lookbehind

regex = /(?<=.*?:).*/g

can be used to produce an array with all matches found in the inputText (from Piotr Berebecki's answer):

> inputText.match(regex)
[
  '"How secure is my information?"someRandomTextHere',
  '"Not very much"',
  '"How to improve this?"',
  `"Don't use '123456' for your password"`,
  '"OK just like in the "Hackers" movie."'
]

Each match consists of the quoted string following the first colon in a line.

In the absence of lookbehinds, a regular expression with groups can be used:

regex = /(.*?:)(.*)/g

With this, each match consists of a complete line, with two groups: the first containing the part up to the colon and the second containing the rest.

> inputText.match(regex)
[
  'Text:"How secure is my information?"someRandomTextHere',
  'Voice:"Not very much"',
  'Text:"How to improve this?"',
  `Voice:"Don't use '123456' for your password"`,
  'Text:"OK just like in the "Hackers" movie."'
]

To see the groups, you must use the .exec method. The first match looks so:

> [...regex.exec(inputText)]
[
  'Text:"How secure is my information?"someRandomTextHere',
  'Text:',
  '"How secure is my information?"someRandomTextHere'
]

To loop over all matches and process only the second group of each (that is, the part after the colon from each line), use something like:

> for (var m, regex = /(.*?:)(.*)/g; m = regex.exec(inputText); ) console.log(m[2]);
"How secure is my information?"someRandomTextHere
"Not very much"
"How to improve this?"
"Don't use '123456' for your password"
"OK just like in the "Hackers" movie."

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM