简体   繁体   English

preg_match正则表达式匹配可能存在或可能不存在的字符串部分

[英]preg_match regex to match part of a string that may or may not be present

I am trying to parse some text reports into structured data. 我正在尝试将一些文本报告解析为结构化数据。 Typical lines are 典型的线是

 Cat. No.: 1      Location: Bottles, boxes etc
 Cat. No.: 25      Location: Woods size B      EBN: 63.1868
 Cat. No.: 24      Location: Woods size B      EBN: 12.1980.221
 Cat. No.: 20      Location: Woods size B      EBN: 4.1973
 Cat. No.: 19      Location: Woods size B

The first two values are always present, the last is optional. 前两个值始终存在,最后一个是可选的。

/Cat\\. No\\.: (\\d+) Location: (.+)(?: EBN: ([\\d\\.]+))/

works for lines with all three values but my instinct is that I need to add a ? 适用于所有三个值的行,但我的直觉是我需要添加一个? to the end to make the last part optional ie 到最后使最后一部分可选

/Cat\\. No\\.: (\\d+) Location: (.+)(?: EBN: ([\\d\\.]+))/?

I am then finding that capture group 2 is matching everything after 'Location: ' so for eg line 2 it becomes 'Woods size B EBN: 63.1868' 然后,我发现捕获组2匹配“位置:”之后的所有内容,因此对于第2行,它变为“伍兹B EBN:63.1868”

Have saved this at https://regex101.com/r/gd0pKH/1 and would be grateful for any advice. 已将其保存在https://regex101.com/r/gd0pKH/1 ,对于任何建议,我们将不胜感激。 RegEx to match part of string that may or may not be present appears to be the same question and has the same answer I came up with, but for some reason it doesn't seem to be working for me! 正则表达式匹配可能存在或可能不存在的字符串部分似乎是相同的问题,并且给出了与我相同的答案,但是由于某种原因,它似乎不适用于我!

You could fix your regex with the following steps: 您可以按照以下步骤修复正则表达式:

  1. The second matching group ( (.+) ) should be ungready, or it would match everything till the end of the line: (.+?) 第二个匹配组( (.+) )应该没有准备好,否则它将匹配所有内容,直到该行的末尾: (.+?)

  2. You should add an anchor to the end of the line $ , otherwise the regex would stop with the first matching expression - which is obviously the shorter version and in this case, your third matching group would be empty. 您应该在$的末尾添加一个定位符,否则regex将以第一个匹配表达式停止-显然是较短的版本,在这种情况下,第三个匹配组将为空。

Alltogether, you get this: 总而言之,您得到以下信息:

Cat\. No\.: (\d+)      Location: (.+?)(?:      EBN: ([\d\.]+))?$

In addition, you could thin about, using \\s+ , instead of the six spaces, which makes the expression more flexible. 另外,您可以使用\\s+代替六个空格,从而使表达式更加灵活。

Cat\. No\.: (\d+)\s+Location: (.+?)(?:\s+EBN: ([\d\.]+))?$

You can have the Location value repeat lazily, and then use positive lookahead for either two spaces in a row (for a line with EBN ), or the end of the line (for a line without EBN ): 您可以让Location值懒洋洋地重复,然后对行中的两个空格(对于带有EBN的行)或行的末尾(对于没有EBN的行)使用正向超前:

Cat\. No\.: (\d+)      Location: (.+?)(?=  |$)(?:      EBN: ([\d\.]+))?

https://regex101.com/r/gd0pKH/2 https://regex101.com/r/gd0pKH/2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM