[英]Capture text after multiple optional strings into named group with Regex
I am trying to extract multiple strings using different patterns from one long string.我正在尝试使用不同的模式从一个长字符串中提取多个字符串。 Here is an example of the input string:以下是输入字符串的示例:
[Update 2]Number of students: 5[New]Break at 1:45 pm\nStudents involved are: John, Joseph, Maria\nLunch at 2:00pm\nActivities remaining: long jump, shuffle [更新 2]学生人数:5 [新]下午 1:45 休息\n参与的学生有:约翰、约瑟夫、玛丽亚\n下午 2:00 午餐\n剩余活动:跳远、随机播放
There are three prefixes which are used to extract the data after it: 'Students involved are:', 'Activities remaining:', 'Number of students:'.后面有三个前缀用于提取数据:'Students involved are:', 'Activities remaining:', 'Number of students:'。 I managed to extract the above into a named group using the following Regex:我设法使用以下正则表达式将以上内容提取到命名组中:
let pattern = /(?<=Number of students: )(?<number>[^\n]+).*?(?<=Students involved are: )(?<students>[^\n]+).*?(?<=Activities remaining: )(?<activities>[^\n]+)/gms
let match = pattern.exec(s)
const num = match.number;
const activities = match.activities;
The above works.以上作品。 However, I run into an issue when there is one of the strings missing.但是,当缺少其中一个字符串时,我遇到了一个问题。 All the three prefixes I am searching for are optional.我要搜索的所有三个前缀都是可选的。 How can I modify the regex to handle optional patterns?如何修改正则表达式以处理可选模式? Or is there a better way of accomplishing this?或者有更好的方法来完成这个吗? Thanks!谢谢!
I'm not sure you need look behind assertions for that use case...我不确定您是否需要查看该用例的断言......
To answer your question you can wrap your individual patterns inside non-capturing groups followed by a question mark:要回答您的问题,您可以将您的个人模式包装在非捕获组中,后跟一个问号:
const r = /(?:Desc.1:\s*(?<tag1>.*?))?(?:Descr.2:\s*(?<tag2>.*?))?(?:Desc.3:\s*(?<tag3>.*?))?/
If the values can come in any order, you can use a global match and a disjunction:如果值可以按任何顺序出现,则可以使用全局匹配和析取:
const r = /x(?<tag1>.*)|y(?<tag2>.*)|z(?<tag3>.*)/g
for (const {groups: {tag1, tag2, tag3}} of source.matchAll(r)) {
...
}
You can see it in action here你可以在这里看到它的实际效果
Also, FYI, the flags you're using don't make a lot of sense to me:另外,仅供参考,您使用的标志对我来说没有多大意义:
g
is useful to match several times (eg with "".matchAll(/regexp/g)
, but it is useless otherwise) g
可用于多次匹配(例如使用"".matchAll(/regexp/g)
,否则无用)m
makes ^
and $
assertions match the start and end of lines on top of their usual duty, but you're not using them m
使^
和$
断言在它们通常的职责之上匹配行的开始和结束,但你没有使用它们Here's my attempt:这是我的尝试:
"^\[[^\]]+\](Number of students: )*(?<number>[^\n]+)\\n(Students involved are: )*(?<students>[^\n]+)\\n(Activities remaining: )*(?<activities>[^\n]+)"
The differences between mine and yours are the following:我和你的区别如下:
^\[[^\]]+\]
at the beginning to match [<any characters>]
在开头添加^\[[^\]]+\]
以匹配[<any characters>]
*
over the optional parts of your string在字符串的可选部分添加*
\\n
between the paired three parts在配对的三个部分之间添加\\n
I've tested this regex with these two examples:我用这两个例子测试了这个正则表达式:
[Update 2]Number of students: 5[New]Break at 1:45 pm\nStudents involved are: John, Joseph, Maria\nLunch at 2:00pm\nActivities remaining: long jump, shuffle
[Update 2]5[New]Break at 1:45 pm\nJohn, Joseph, Maria\nLunch at 2:00pm\nlong jump, shuffle
Does it work for you?对你起作用吗?
ps.附言。 for any attempt to increased efficiency, more samples of pattern matching are needed对于任何提高效率的尝试,都需要更多的模式匹配样本
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.