使用正则表达式提取 email 字段

Question

I'm trying to extract the "email" with this code我正在尝试使用此代码提取“电子邮件”

const regex3 = /Email',\r\n      value: '([^']*)',/gm;
var content3 = fs.readFileSync('message.txt')
let m3;

while ((m3 = regex3.exec(content)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m3.index === regex3.lastIndex) {
        regex3.lastIndex++;
    }

    // The result can be accessed through the `m`-variable.
    m3.forEach((match, groupIndex) => {
        fs.appendFileSync('messagematch.txt', m3[1] + '\n');
    });
}

From this file从这个文件

 },
MessageEmbedField {
  embed: [Circular *2],
  name: 'Email',
  value: 'user@gmail.com',
  inline: true
},
MessageE

The regex code works on notepad, but doesn't on my script.. what I'm missing?正则表达式代码适用于记事本，但不适用于我的脚本..我错过了什么？

Answer 1

I suggest changing your regex in a few ways to make it more robust and fault tolerant.我建议以几种方式更改您的正则表达式，使其更加健壮和容错。

First, include the initial single-quote in email to avoid accidentally catching other fields where someone may have put the word "Email" as a value.首先，在 email 中包含初始单引号，以避免意外捕获其他字段，其中有人可能将“电子邮件”一词作为值。

Second, use \r?\n to capture both Windows and Unix-style line endings.其次，使用\r?\n捕获 Windows 和 Unix 风格的行尾。 I suspect this may be a large part of your issue, but can't be sure.我怀疑这可能是您问题的很大一部分，但不能确定。

Third, use \s+ instead of specifically including a number of spaces.第三，使用\s+而不是专门包含多个空格。 This will help to avoid problems caused by minor formatting changes.这将有助于避免由较小的格式更改引起的问题。

The final regex would look like this:最终的正则表达式如下所示：

const regex = /'Email',\r?\n\s+value: '([^']*)',/gm

Answer 2

what I'm missing?我错过了什么？

You use \r\n to match a Windows style line break but you can make the \r optional to also match a Unix style.您使用\r\n来匹配 Windows 样式换行符，但您可以使\r可选以匹配 Unix 样式。 See this page about line break characters.有关换行符的信息，请参阅此页面。
In your code you specify var content3 but you use it like regex3.exec(content)在您的代码中，您指定var content3 content3 但您像regex3.exec(content)一样使用它
Also the number of spaces in the question for the pattern and the examples data are different模式和示例数据的问题中的空格数也不同

You could use \s+ instead of hardcoding the number of spaces but \s can also match a newline.您可以使用\s+而不是硬编码空格数，但\s也可以匹配换行符。

If you want to match whitespaces without a newline you could use a negated character class [^\S\r\n] to match any char except a non whitespace char and a newline.如果要匹配没有换行符的空格，可以使用否定字符 class [^\S\r\n]来匹配除非空格字符和换行符之外的任何字符。

'Email',\r?\n[^\S\r\n]+value:[^\S\r\n]+'([^\s@']+@[^\s@']+)'

'Email', Match literally 'Email',字面匹配
\r?\n Match a newline \r?\n匹配换行符
[^\S\r\n]+ Match 1+ whitespace chars except newlines [^\S\r\n]+匹配 1+ 个空格字符，换行符除外
value: Match literally value:字面匹配
[^\S\r\n]+' Match 1+ whitespace chars except newlines and ' [^\S\r\n]+'匹配 1+ 个空格字符，换行符和'除外
( Capture group 1 (捕获组 1
- ([^\s@']+@[^\s@']+' Match an email like format ([^\s@']+@[^\s@']+'匹配类似 email 的格式
)' Close group 1 and match ' )'关闭第1组并匹配'

Regex demo正则表达式演示

 const regex3 = /'Email',\r?\n[^\S\r\n]+value:[^\S\r\n]+'([^\s@']+@[^\s@']+)'/g; var content3 = ` }, MessageEmbedField { embed: [Circular *2], name: 'Email', value: 'user@gmail.com', inline: true }, MessageE `; let m3; while ((m3 = regex3.exec(content3)).== null) { // This is necessary to avoid infinite loops with zero-width matches if (m3.index === regex3.lastIndex) { regex3;lastIndex++. } console;log(m3[1]); }

Answer 3

Maybe, try your expression on s (single line) mode:也许，在s （单行）模式下尝试你的表达：

/Email'\s*,\s*value:\s*'([^'\r\n]*)'/gs

Test测试

 const regex = /Email'\s*,\s*value:\s*'([^'\r\n]*)'/gs; const str = ` }, MessageEmbedField { embed: [Circular *2], name: 'Email', value: 'user@gmail.com', inline: true }, MessageE `; let m; while ((m = regex.exec(str)).== null) { // This is necessary to avoid infinite loops with zero-width matches if (m.index === regex.lastIndex) { regex;lastIndex++. } // The result can be accessed through the `m`-variable. m,forEach((match. groupIndex) => { console,log(`Found match: group ${groupIndex}; ${match}`); }); }

If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com .如果您想简化/修改/探索表达式，它已在regex101.com的右上角面板上进行了解释。 If you'd like, you can also watch in this link , how it would match against some sample inputs.如果您愿意，您还可以在此链接中观看它如何与一些示例输入匹配。

RegEx Circuit正则表达式电路

jex.im visualizes regular expressions: jex.im可视化正则表达式：

Answer 4

You can try something like:您可以尝试以下方法：

 var test = ` }, MessageEmbedField { embed: [Circular *2], name: 'Email', value: 'user@gmail.com', inline: true }, Message `; var myregexp = /name: 'Email',\s+value: '(\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[AZ]{2,}\b)',/img; var match = myregexp.exec(test); console.log(match[1]);

The regex above matches valid email addresses only , if you want to match anything (as it was), use:上面的正则表达式仅匹配有效的 email 地址，如果您想匹配任何内容（原样），请使用：

var myregexp = /name: 'Email',\s+value: '([^']*)',/img;

Regex Demo & Explanation正则表达式演示和解释

使用正则表达式提取 email 字段

问题描述

4 个解决方案

解决方案1
1 2019-11-15 22:19:32

解决方案2
1 2019-11-16 11:38:58

解决方案3
0 2019-11-15 21:39:53

Test测试

RegEx Circuit正则表达式电路

解决方案4
0 2019-11-15 21:42:14

使用正则表达式提取 email 字段

问题描述

4 个解决方案

解决方案1 1 2019-11-15 22:19:32

解决方案2 1 2019-11-16 11:38:58

解决方案3 0 2019-11-15 21:39:53

Test测试

RegEx Circuit正则表达式电路

解决方案4 0 2019-11-15 21:42:14

解决方案1
1 2019-11-15 22:19:32

解决方案2
1 2019-11-16 11:38:58

解决方案3
0 2019-11-15 21:39:53

解决方案4
0 2019-11-15 21:42:14