简体   繁体   English

使用正则表达式将多个可选字符串后的文本捕获到命名组中

[英]Capture text after multiple optional strings into named group with Regex

I am trying to extract multiple strings using different patterns from one long string.我正在尝试使用不同的模式从一个长字符串中提取多个字符串。 Here is an example of the input string:以下是输入字符串的示例:

[Update 2]Number of students: 5[New]Break at 1:45 pm\nStudents involved are: John, Joseph, Maria\nLunch at 2:00pm\nActivities remaining: long jump, shuffle [更新 2]学生人数:5 [新]下午 1:45 休息\n参与的学生有:约翰、约瑟夫、玛丽亚\n下午 2:00 午餐\n剩余活动:跳远、随机播放

There are three prefixes which are used to extract the data after it: 'Students involved are:', 'Activities remaining:', 'Number of students:'.后面有三个前缀用于提取数据:'Students involved are:', 'Activities remaining:', 'Number of students:'。 I managed to extract the above into a named group using the following Regex:我设法使用以下正则表达式将以上内容提取到命名组中:

let pattern = /(?<=Number of students: )(?<number>[^\n]+).*?(?<=Students involved are: )(?<students>[^\n]+).*?(?<=Activities remaining: )(?<activities>[^\n]+)/gms
let match = pattern.exec(s)
const num = match.number;
const activities = match.activities;

The above works.以上作品。 However, I run into an issue when there is one of the strings missing.但是,当缺少其中一个字符串时,我遇到了一个问题。 All the three prefixes I am searching for are optional.我要搜索的所有三个前缀都是可选的。 How can I modify the regex to handle optional patterns?如何修改正则表达式以处理可选模式? Or is there a better way of accomplishing this?或者有更好的方法来完成这个吗? Thanks!谢谢!

I'm not sure you need look behind assertions for that use case...我不确定您是否需要查看该用例的断言......

To answer your question you can wrap your individual patterns inside non-capturing groups followed by a question mark:要回答您的问题,您可以将您的个人模式包装在非捕获组中,后跟一个问号:

const r = /(?:Desc.1:\s*(?<tag1>.*?))?(?:Descr.2:\s*(?<tag2>.*?))?(?:Desc.3:\s*(?<tag3>.*?))?/

If the values can come in any order, you can use a global match and a disjunction:如果值可以按任何顺序出现,则可以使用全局匹配和析取:

const r = /x(?<tag1>.*)|y(?<tag2>.*)|z(?<tag3>.*)/g

for (const {groups: {tag1, tag2, tag3}} of source.matchAll(r)) {
 ...
}

You can see it in action here你可以在这里看到它的实际效果

Also, FYI, the flags you're using don't make a lot of sense to me:另外,仅供参考,您使用的标志对我来说没有多大意义:

  • g is useful to match several times (eg with "".matchAll(/regexp/g) , but it is useless otherwise) g可用于多次匹配(例如使用"".matchAll(/regexp/g) ,否则无用)
  • m makes ^ and $ assertions match the start and end of lines on top of their usual duty, but you're not using them m使^$断言在它们通常的职责之上匹配行的开始和结束,但你没有使用它们

Here's my attempt:这是我的尝试:

"^\[[^\]]+\](Number of students: )*(?<number>[^\n]+)\\n(Students involved are: )*(?<students>[^\n]+)\\n(Activities remaining: )*(?<activities>[^\n]+)"

The differences between mine and yours are the following:我和你的区别如下:

  • added ^\[[^\]]+\] at the beginning to match [<any characters>]在开头添加^\[[^\]]+\]以匹配[<any characters>]
  • added * over the optional parts of your string在字符串的可选部分添加*
  • added \\n between the paired three parts在配对的三个部分之间添加\\n

I've tested this regex with these two examples:我用这两个例子测试了这个正则表达式:

  • [Update 2]Number of students: 5[New]Break at 1:45 pm\nStudents involved are: John, Joseph, Maria\nLunch at 2:00pm\nActivities remaining: long jump, shuffle
  • [Update 2]5[New]Break at 1:45 pm\nJohn, Joseph, Maria\nLunch at 2:00pm\nlong jump, shuffle

Does it work for you?对你起作用吗?

ps.附言。 for any attempt to increased efficiency, more samples of pattern matching are needed对于任何提高效率的尝试,都需要更多的模式匹配样本

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM