使用正则表达式将多个可选字符串后的文本捕获到命名组中

Question

I am trying to extract multiple strings using different patterns from one long string.我正在尝试使用不同的模式从一个长字符串中提取多个字符串。 Here is an example of the input string:以下是输入字符串的示例：

[Update 2]Number of students: 5[New]Break at 1:45 pm\nStudents involved are: John, Joseph, Maria\nLunch at 2:00pm\nActivities remaining: long jump, shuffle [更新 2]学生人数：5 [新]下午 1:45 休息\n参与的学生有：约翰、约瑟夫、玛丽亚\n下午 2:00 午餐\n剩余活动：跳远、随机播放

There are three prefixes which are used to extract the data after it: 'Students involved are:', 'Activities remaining:', 'Number of students:'.后面有三个前缀用于提取数据：'Students involved are:', 'Activities remaining:', 'Number of students:'。 I managed to extract the above into a named group using the following Regex:我设法使用以下正则表达式将以上内容提取到命名组中：

let pattern = /(?<=Number of students: )(?<number>[^\n]+).*?(?<=Students involved are: )(?<students>[^\n]+).*?(?<=Activities remaining: )(?<activities>[^\n]+)/gms
let match = pattern.exec(s)
const num = match.number;
const activities = match.activities;

The above works.以上作品。 However, I run into an issue when there is one of the strings missing.但是，当缺少其中一个字符串时，我遇到了一个问题。 All the three prefixes I am searching for are optional.我要搜索的所有三个前缀都是可选的。 How can I modify the regex to handle optional patterns?如何修改正则表达式以处理可选模式？ Or is there a better way of accomplishing this?或者有更好的方法来完成这个吗？ Thanks!谢谢！

Answer 1

I'm not sure you need look behind assertions for that use case...我不确定您是否需要查看该用例的断言......

To answer your question you can wrap your individual patterns inside non-capturing groups followed by a question mark:要回答您的问题，您可以将您的个人模式包装在非捕获组中，后跟一个问号：

const r = /(?:Desc.1:\s*(?<tag1>.*?))?(?:Descr.2:\s*(?<tag2>.*?))?(?:Desc.3:\s*(?<tag3>.*?))?/

If the values can come in any order, you can use a global match and a disjunction:如果值可以按任何顺序出现，则可以使用全局匹配和析取：

const r = /x(?<tag1>.*)|y(?<tag2>.*)|z(?<tag3>.*)/g

for (const {groups: {tag1, tag2, tag3}} of source.matchAll(r)) {
 ...
}

You can see it in action here你可以在这里看到它的实际效果

Also, FYI, the flags you're using don't make a lot of sense to me:另外，仅供参考，您使用的标志对我来说没有多大意义：

g is useful to match several times (eg with "".matchAll(/regexp/g) , but it is useless otherwise) g可用于多次匹配（例如使用"".matchAll(/regexp/g) ，否则无用）
m makes ^ and $ assertions match the start and end of lines on top of their usual duty, but you're not using them m使^和$断言在它们通常的职责之上匹配行的开始和结束，但你没有使用它们

Answer 2

Here's my attempt:这是我的尝试：

"^\[[^\]]+\](Number of students: )*(?<number>[^\n]+)\\n(Students involved are: )*(?<students>[^\n]+)\\n(Activities remaining: )*(?<activities>[^\n]+)"

The differences between mine and yours are the following:我和你的区别如下：

added ^\[[^\]]+\] at the beginning to match [<any characters>]在开头添加^\[[^\]]+\]以匹配[<any characters>]
added * over the optional parts of your string在字符串的可选部分添加*
added \\n between the paired three parts在配对的三个部分之间添加\\n

I've tested this regex with these two examples:我用这两个例子测试了这个正则表达式：

[Update 2]Number of students: 5[New]Break at 1:45 pm\nStudents involved are: John, Joseph, Maria\nLunch at 2:00pm\nActivities remaining: long jump, shuffle
[Update 2]5[New]Break at 1:45 pm\nJohn, Joseph, Maria\nLunch at 2:00pm\nlong jump, shuffle

Does it work for you?对你起作用吗？

ps.附言。 for any attempt to increased efficiency, more samples of pattern matching are needed对于任何提高效率的尝试，都需要更多的模式匹配样本

使用正则表达式将多个可选字符串后的文本捕获到命名组中

问题描述

2 个解决方案

解决方案1
1 2022-04-24 19:20:51

解决方案2
0 2022-04-13 00:07:49

使用正则表达式将多个可选字符串后的文本捕获到命名组中

问题描述

2 个解决方案

解决方案1 1 2022-04-24 19:20:51

解决方案2 0 2022-04-13 00:07:49

解决方案1
1 2022-04-24 19:20:51

解决方案2
0 2022-04-13 00:07:49