简体   繁体   English

JS Regex - 查找包含特殊字符的子字符串

[英]JS Regex- find a substring contains special characters

Can you please help me to understand how to do the following?您能帮我了解如何执行以下操作吗?

I'm having a strings (3 formats for this string):我有一个字符串(此字符串的 3 种格式):

  1. "Section_1: hello & goodbye | section_2: kuku" “Section_1:你好和再见 | section_2:kuku”
  2. "Section_1: hello & goodbye & hola | section_2: kuku" “第 1 部分:你好、再见、你好 | 第 2 部分:库库”
  3. "Section_1: hello | section_2: kuku" “第 1 节:你好 | 第 2 节:库库”

I want the get the result:我想要得到结果:

  1. Group section_1: "hello & goodbye", Group section_2: "kuku"组section_1:“你好,再见”,组section_2:“kuku”
  2. Group section_1: "hello & goodbye & hola", Group section_2: "kuku"组section_1:“你好,再见,你好”,组section_2:“kuku”
  3. Group section_1: "hello", Group section_2: "kuku"组section_1:“你好”,组section_2:“kuku”

Now I have the regex (but it's not working for me because of the '&'):现在我有了正则表达式(但它对我不起作用,因为'&'):

Section_1:\s*(?<section_1>\w+)(\s*\|\s*(Section_2:(\s*(?<section_2>.*))?)?)?

Note: the regex is capturing 2 groups- "section_1" and "section_2"注意:正则表达式正在捕获 2 个组-“section_1”和“section_2”

The question is- how can I read sub string the can contains zero or more from " & {word}"问题是 - 我如何从“&{word}”中读取包含零个或多个的子字符串

Thanks in advance提前致谢

As per the comments we established that the ' & '- combination acts as a delimiter between words.根据评论,我们确定 ' & '- 组合充当单词之间的分隔符。 There are probably a ton of ways to write a pattern to capture these substrings, but to me these can be grouped into extensive or simple.可能有很多方法可以编写模式来捕获这些子字符串,但对我来说,这些可以分为广泛的或简单的。 Depending if you need to validate the input more thoroughly you could use:根据您是否需要更彻底地验证输入,您可以使用:

^section_1:\s*(?<section_1>[a-z]+(?:\s&\s[a-z]+)*)\s*\|\s*section_2:\s*(?<section_2>[a-z]+(?:\s&\s[a-z]+)*)$

See an online demo .查看在线演示 The pattern means:图案的意思是:

  • ^ - Start-line anchor; ^ - 起跑线锚;
  • section_1:\s* - Match 'Section_1:' literally followed by 0+ whitespace characters; section_1:\s* - 匹配 'Section_1:' 后跟 0+ 个空白字符;
  • (?<section_1>[az]+(?:\s+&\s[az]+)*) - A named capture group to catch [az]+ as 1+ upper/lower letters (case-insensitive flag), followed by a nested non-capture group matching 0+ times the pattern (?:\s&\s[az]+)* to test for any delimiter as per above followed by another word; (?<section_1>[az]+(?:\s+&\s[az]+)*) - 一个命名捕获组,将[az]+捕获为 1+ 个大写/小写字母(不区分大小写的标志),后跟一个嵌套的非捕获组,匹配 0+ 倍模式(?:\s&\s[az]+)*以测试上述任何分隔符,然后是另一个单词;
  • \s*\|\s*section_2:\s* - Match whitespace characters, a literal pipe-symbol and literally 'Section_2:' upto; \s*\|\s*section_2:\s* - 匹配空白字符、文字管道符号和字面上的“Section_2:”;
  • (?<section_2>[az]+(?:\s&\s[az]+)*) - A 2nd named capture group to match the same pattern as the above named capture group; (?<section_2>[az]+(?:\s&\s[az]+)*) - 第二个命名的捕获组,与上述命名的捕获组匹配相同的模式;
  • $ - End-line anchor. $ - 结束线锚。

Note : As mentioned, there are a ton of differnt pattern one could use depending on how specific you need to be about validating input.注意:如前所述,可以使用大量不同的模式,具体取决于您对验证输入的具体要求。 For example: \s*(?<section_1>[^:|]+?)\s*\|\s*[^:]*:\s*(?<section_2>.+) may also work.例如: \s*(?<section_1>[^:|]+?)\s*\|\s*[^:]*:\s*(?<section_2>.+)也可以工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM