简体   繁体   English

使用正则表达式匹配多种标题样式

[英]Matching multiple heading styles using regex

I'm trying to use regex to capture section headings, but why is it that I am able to capture "4.1 General" with this, however if I add a newline to the end of the regex \\n([\\d\\.]+ ?\\w+)\\n it no longer captures that line? 我正在尝试使用正则表达式来捕获节标题,但是为什么我能够以此捕获“ 4.1 General”,但是如果我在正则表达式\\n([\\d\\.]+ ?\\w+)\\n的末尾添加换行符,为什么呢? \\n([\\d\\.]+ ?\\w+)\\n它不再捕获该行? Is it not followed by a newline or am I missing something? 它后面没有换行符还是我缺少什么?

Here's my example for reference 这是我的例子供参考

\n([\d\.]+ ?\w+)

Input 输入项

3.6.10
POLLUTION DEGREE 4
continuous conductivity occurs due to conductive dust, rain or other wet conditions
3.6.11
CLEARANCE
shortest distance in air between two conductive parts
3.6.12
CREEPAGE DISTANCE
shortest distance along the surface of a solid insulating material between two conductive
parts
4 Tests
4.1 General
Tests in this standard are TYPE TESTS to be carried out on samples of equipment or parts.

\\n([\\d\\.]+ ?\\w+)\\n? doesn't seem to work either. 似乎也不起作用。

It is a classical case of overlapping matches. 这是重叠比赛的经典案例。 The previous match contains \\n4 Tests\\n and that last \\n is already consumed, thus preventing the next match. 上一个匹配项包含\\n4 Tests\\n ,而最后一个\\n已被消耗,因此阻止了下一个匹配项。

I see you want to match texts that are whole lines of the text, so, it makes more sense to use ^ and $ anchors with the RegexOptions.Multiline option: 我看到您想匹配作为文本整行的文本,因此,将^$锚与RegexOptions.Multiline选项一起使用更有意义:

@"(?m)^([\d.]+ ?\w+)\r?$"

See the .NET regex online demo 参见.NET regex在线演示

Note that $ in a .NET regex matches only before \\n and since Windows line endings are CRLF, it is required to use an optional CR before $ , \\r? 请注意,.NET正则表达式中的$仅在\\n之前匹配,并且由于Windows行尾是CRLF,因此需要在$\\r?之前使用可选的CR \\r? .

Results: 结果:

在此处输入图片说明

Have you considered that the new line may not be a single character? 您是否认为新行可能不是单个字符?

\n([0-9\.]+ ?\w+)(\n|\r)

Using Expresso the above regex has 4 matches from your sample, the last one is 使用Expresso,上面的正则表达式有4个匹配项,最后一个是

[LF]4.1 General[CR]

where [LF] is \\n and [CR] is \\r. 其中[LF]为\\ n,[CR]为\\ r。

Keep in mind [CR], [LF] and [CRLF] are all possible designations for end of line. 请记住[CR],[LF]和[CRLF]都是行尾的可能名称。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM