简体   繁体   English

使用正则表达式提取 email 字段

[英]Extract email field with regex

I'm trying to extract the "email" with this code我正在尝试使用此代码提取“电子邮件”

const regex3 = /Email',\r\n      value: '([^']*)',/gm;
var content3 = fs.readFileSync('message.txt')
let m3;

while ((m3 = regex3.exec(content)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m3.index === regex3.lastIndex) {
        regex3.lastIndex++;
    }

    // The result can be accessed through the `m`-variable.
    m3.forEach((match, groupIndex) => {
        fs.appendFileSync('messagematch.txt', m3[1] + '\n');
    });
}

From this file从这个文件

 },
MessageEmbedField {
  embed: [Circular *2],
  name: 'Email',
  value: 'user@gmail.com',
  inline: true
},
MessageE   

The regex code works on notepad, but doesn't on my script.. what I'm missing?正则表达式代码适用于记事本,但不适用于我的脚本..我错过了什么?

I suggest changing your regex in a few ways to make it more robust and fault tolerant.我建议以几种方式更改您的正则表达式,使其更加健壮和容错。

First, include the initial single-quote in email to avoid accidentally catching other fields where someone may have put the word "Email" as a value.首先,在 email 中包含初始单引号,以避免意外捕获其他字段,其中有人可能将“电子邮件”一词作为值。

Second, use \r?\n to capture both Windows and Unix-style line endings.其次,使用\r?\n捕获 Windows 和 Unix 风格的行尾。 I suspect this may be a large part of your issue, but can't be sure.我怀疑这可能是您问题的很大一部分,但不能确定。

Third, use \s+ instead of specifically including a number of spaces.第三,使用\s+而不是专门包含多个空格。 This will help to avoid problems caused by minor formatting changes.这将有助于避免由较小的格式更改引起的问题。

The final regex would look like this:最终的正则表达式如下所示:

const regex = /'Email',\r?\n\s+value: '([^']*)',/gm

what I'm missing?我错过了什么?

  • You use \r\n to match a Windows style line break but you can make the \r optional to also match a Unix style.您使用\r\n来匹配 Windows 样式换行符,但您可以使\r可选以匹配 Unix 样式。 See this page about line break characters.有关换行符的信息,请参阅此页面
  • In your code you specify var content3 but you use it like regex3.exec(content)在您的代码中,您指定var content3 content3 但您像regex3.exec(content)一样使用它
  • Also the number of spaces in the question for the pattern and the examples data are different模式和示例数据的问题中的空格数也不同

You could use \s+ instead of hardcoding the number of spaces but \s can also match a newline.您可以使用\s+而不是硬编码空格数,但\s也可以匹配换行符。

If you want to match whitespaces without a newline you could use a negated character class [^\S\r\n] to match any char except a non whitespace char and a newline.如果要匹配没有换行符的空格,可以使用否定字符 class [^\S\r\n]来匹配除非空格字符和换行符之外的任何字符。

'Email',\r?\n[^\S\r\n]+value:[^\S\r\n]+'([^\s@']+@[^\s@']+)'
  • 'Email', Match literally 'Email',字面匹配
  • \r?\n Match a newline \r?\n匹配换行符
  • [^\S\r\n]+ Match 1+ whitespace chars except newlines [^\S\r\n]+匹配 1+ 个空格字符,换行符除外
  • value: Match literally value:字面匹配
  • [^\S\r\n]+' Match 1+ whitespace chars except newlines and ' [^\S\r\n]+'匹配 1+ 个空格字符,换行符和'除外
  • ( Capture group 1 (捕获组 1
    • ([^\s@']+@[^\s@']+' Match an email like format ([^\s@']+@[^\s@']+'匹配类似 email 的格式
  • )' Close group 1 and match ' )'关闭第1组并匹配'

Regex demo正则表达式演示

 const regex3 = /'Email',\r?\n[^\S\r\n]+value:[^\S\r\n]+'([^\s@']+@[^\s@']+)'/g; var content3 = ` }, MessageEmbedField { embed: [Circular *2], name: 'Email', value: 'user@gmail.com', inline: true }, MessageE `; let m3; while ((m3 = regex3.exec(content3)).== null) { // This is necessary to avoid infinite loops with zero-width matches if (m3.index === regex3.lastIndex) { regex3;lastIndex++. } console;log(m3[1]); }

Maybe, try your expression on s (single line) mode:也许,在s (单行)模式下尝试你的表达:

/Email'\s*,\s*value:\s*'([^'\r\n]*)'/gs

Test测试

 const regex = /Email'\s*,\s*value:\s*'([^'\r\n]*)'/gs; const str = ` }, MessageEmbedField { embed: [Circular *2], name: 'Email', value: 'user@gmail.com', inline: true }, MessageE `; let m; while ((m = regex.exec(str)).== null) { // This is necessary to avoid infinite loops with zero-width matches if (m.index === regex.lastIndex) { regex;lastIndex++. } // The result can be accessed through the `m`-variable. m,forEach((match. groupIndex) => { console,log(`Found match: group ${groupIndex}; ${match}`); }); }


If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com .如果您想简化/修改/探索表达式,它已在regex101.com的右上角面板上进行了解释。 If you'd like, you can also watch in this link , how it would match against some sample inputs.如果您愿意,您还可以在此链接中观看它如何与一些示例输入匹配。


RegEx Circuit正则表达式电路

jex.im visualizes regular expressions: jex.im可视化正则表达式:

在此处输入图像描述

You can try something like:您可以尝试以下方法:

 var test = ` }, MessageEmbedField { embed: [Circular *2], name: 'Email', value: 'user@gmail.com', inline: true }, Message `; var myregexp = /name: 'Email',\s+value: '(\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[AZ]{2,}\b)',/img; var match = myregexp.exec(test); console.log(match[1]);


The regex above matches valid email addresses only , if you want to match anything (as it was), use:上面的正则表达式仅匹配有效的 email 地址,如果您想匹配任何内容(原样),请使用:

var myregexp = /name: 'Email',\s+value: '([^']*)',/img;

Regex Demo & Explanation正则表达式演示和解释

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM