[英]Extract email field with regex
I'm trying to extract the "email" with this code我正在尝试使用此代码提取“电子邮件”
const regex3 = /Email',\r\n value: '([^']*)',/gm;
var content3 = fs.readFileSync('message.txt')
let m3;
while ((m3 = regex3.exec(content)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m3.index === regex3.lastIndex) {
regex3.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m3.forEach((match, groupIndex) => {
fs.appendFileSync('messagematch.txt', m3[1] + '\n');
});
}
From this file从这个文件
},
MessageEmbedField {
embed: [Circular *2],
name: 'Email',
value: 'user@gmail.com',
inline: true
},
MessageE
The regex code works on notepad, but doesn't on my script.. what I'm missing?正则表达式代码适用于记事本,但不适用于我的脚本..我错过了什么?
I suggest changing your regex in a few ways to make it more robust and fault tolerant.我建议以几种方式更改您的正则表达式,使其更加健壮和容错。
First, include the initial single-quote in email to avoid accidentally catching other fields where someone may have put the word "Email" as a value.首先,在 email 中包含初始单引号,以避免意外捕获其他字段,其中有人可能将“电子邮件”一词作为值。
Second, use \r?\n
to capture both Windows and Unix-style line endings.其次,使用\r?\n
捕获 Windows 和 Unix 风格的行尾。 I suspect this may be a large part of your issue, but can't be sure.我怀疑这可能是您问题的很大一部分,但不能确定。
Third, use \s+
instead of specifically including a number of spaces.第三,使用\s+
而不是专门包含多个空格。 This will help to avoid problems caused by minor formatting changes.这将有助于避免由较小的格式更改引起的问题。
The final regex would look like this:最终的正则表达式如下所示:
const regex = /'Email',\r?\n\s+value: '([^']*)',/gm
what I'm missing?我错过了什么?
\r\n
to match a Windows style line break but you can make the \r
optional to also match a Unix style.您使用\r\n
来匹配 Windows 样式换行符,但您可以使\r
可选以匹配 Unix 样式。 See this page about line break characters.有关换行符的信息,请参阅此页面。var content3
but you use it like regex3.exec(content)
在您的代码中,您指定var content3
content3 但您像regex3.exec(content)
一样使用它You could use \s+
instead of hardcoding the number of spaces but \s
can also match a newline.您可以使用\s+
而不是硬编码空格数,但\s
也可以匹配换行符。
If you want to match whitespaces without a newline you could use a negated character class [^\S\r\n]
to match any char except a non whitespace char and a newline.如果要匹配没有换行符的空格,可以使用否定字符 class [^\S\r\n]
来匹配除非空格字符和换行符之外的任何字符。
'Email',\r?\n[^\S\r\n]+value:[^\S\r\n]+'([^\s@']+@[^\s@']+)'
'Email',
Match literally 'Email',
字面匹配\r?\n
Match a newline \r?\n
匹配换行符[^\S\r\n]+
Match 1+ whitespace chars except newlines [^\S\r\n]+
匹配 1+ 个空格字符,换行符除外value:
Match literally value:
字面匹配[^\S\r\n]+'
Match 1+ whitespace chars except newlines and '
[^\S\r\n]+'
匹配 1+ 个空格字符,换行符和'
除外(
Capture group 1 (
捕获组 1
([^\s@']+@[^\s@']+'
Match an email like format ([^\s@']+@[^\s@']+'
匹配类似 email 的格式)'
Close group 1 and match '
)'
关闭第1组并匹配'
const regex3 = /'Email',\r?\n[^\S\r\n]+value:[^\S\r\n]+'([^\s@']+@[^\s@']+)'/g; var content3 = ` }, MessageEmbedField { embed: [Circular *2], name: 'Email', value: 'user@gmail.com', inline: true }, MessageE `; let m3; while ((m3 = regex3.exec(content3)).== null) { // This is necessary to avoid infinite loops with zero-width matches if (m3.index === regex3.lastIndex) { regex3;lastIndex++. } console;log(m3[1]); }
Maybe, try your expression on s
(single line) mode:也许,在s
(单行)模式下尝试你的表达:
/Email'\s*,\s*value:\s*'([^'\r\n]*)'/gs
const regex = /Email'\s*,\s*value:\s*'([^'\r\n]*)'/gs; const str = ` }, MessageEmbedField { embed: [Circular *2], name: 'Email', value: 'user@gmail.com', inline: true }, MessageE `; let m; while ((m = regex.exec(str)).== null) { // This is necessary to avoid infinite loops with zero-width matches if (m.index === regex.lastIndex) { regex;lastIndex++. } // The result can be accessed through the `m`-variable. m,forEach((match. groupIndex) => { console,log(`Found match: group ${groupIndex}; ${match}`); }); }
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com .如果您想简化/修改/探索表达式,它已在regex101.com的右上角面板上进行了解释。 If you'd like, you can also watch in this link , how it would match against some sample inputs.如果您愿意,您还可以在此链接中观看它如何与一些示例输入匹配。
You can try something like:您可以尝试以下方法:
var test = ` }, MessageEmbedField { embed: [Circular *2], name: 'Email', value: 'user@gmail.com', inline: true }, Message `; var myregexp = /name: 'Email',\s+value: '(\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[AZ]{2,}\b)',/img; var match = myregexp.exec(test); console.log(match[1]);
The regex above matches valid email addresses only , if you want to match anything (as it was), use:上面的正则表达式仅匹配有效的 email 地址,如果您想匹配任何内容(原样),请使用:
var myregexp = /name: 'Email',\s+value: '([^']*)',/img;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.