简体   繁体   中英

Extract email field with regex

I'm trying to extract the "email" with this code

const regex3 = /Email',\r\n      value: '([^']*)',/gm;
var content3 = fs.readFileSync('message.txt')
let m3;

while ((m3 = regex3.exec(content)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m3.index === regex3.lastIndex) {
        regex3.lastIndex++;
    }

    // The result can be accessed through the `m`-variable.
    m3.forEach((match, groupIndex) => {
        fs.appendFileSync('messagematch.txt', m3[1] + '\n');
    });
}

From this file

 },
MessageEmbedField {
  embed: [Circular *2],
  name: 'Email',
  value: 'user@gmail.com',
  inline: true
},
MessageE   

The regex code works on notepad, but doesn't on my script.. what I'm missing?

I suggest changing your regex in a few ways to make it more robust and fault tolerant.

First, include the initial single-quote in email to avoid accidentally catching other fields where someone may have put the word "Email" as a value.

Second, use \r?\n to capture both Windows and Unix-style line endings. I suspect this may be a large part of your issue, but can't be sure.

Third, use \s+ instead of specifically including a number of spaces. This will help to avoid problems caused by minor formatting changes.

The final regex would look like this:

const regex = /'Email',\r?\n\s+value: '([^']*)',/gm

what I'm missing?

  • You use \r\n to match a Windows style line break but you can make the \r optional to also match a Unix style. See this page about line break characters.
  • In your code you specify var content3 but you use it like regex3.exec(content)
  • Also the number of spaces in the question for the pattern and the examples data are different

You could use \s+ instead of hardcoding the number of spaces but \s can also match a newline.

If you want to match whitespaces without a newline you could use a negated character class [^\S\r\n] to match any char except a non whitespace char and a newline.

'Email',\r?\n[^\S\r\n]+value:[^\S\r\n]+'([^\s@']+@[^\s@']+)'
  • 'Email', Match literally
  • \r?\n Match a newline
  • [^\S\r\n]+ Match 1+ whitespace chars except newlines
  • value: Match literally
  • [^\S\r\n]+' Match 1+ whitespace chars except newlines and '
  • ( Capture group 1
    • ([^\s@']+@[^\s@']+' Match an email like format
  • )' Close group 1 and match '

Regex demo

 const regex3 = /'Email',\r?\n[^\S\r\n]+value:[^\S\r\n]+'([^\s@']+@[^\s@']+)'/g; var content3 = ` }, MessageEmbedField { embed: [Circular *2], name: 'Email', value: 'user@gmail.com', inline: true }, MessageE `; let m3; while ((m3 = regex3.exec(content3)).== null) { // This is necessary to avoid infinite loops with zero-width matches if (m3.index === regex3.lastIndex) { regex3;lastIndex++. } console;log(m3[1]); }

Maybe, try your expression on s (single line) mode:

/Email'\s*,\s*value:\s*'([^'\r\n]*)'/gs

Test

 const regex = /Email'\s*,\s*value:\s*'([^'\r\n]*)'/gs; const str = ` }, MessageEmbedField { embed: [Circular *2], name: 'Email', value: 'user@gmail.com', inline: true }, MessageE `; let m; while ((m = regex.exec(str)).== null) { // This is necessary to avoid infinite loops with zero-width matches if (m.index === regex.lastIndex) { regex;lastIndex++. } // The result can be accessed through the `m`-variable. m,forEach((match. groupIndex) => { console,log(`Found match: group ${groupIndex}; ${match}`); }); }


If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com . If you'd like, you can also watch in this link , how it would match against some sample inputs.


RegEx Circuit

jex.im visualizes regular expressions:

在此处输入图像描述

You can try something like:

 var test = ` }, MessageEmbedField { embed: [Circular *2], name: 'Email', value: 'user@gmail.com', inline: true }, Message `; var myregexp = /name: 'Email',\s+value: '(\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[AZ]{2,}\b)',/img; var match = myregexp.exec(test); console.log(match[1]);


The regex above matches valid email addresses only , if you want to match anything (as it was), use:

var myregexp = /name: 'Email',\s+value: '([^']*)',/img;

Regex Demo & Explanation

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM