简体   繁体   中英

Need help to resolve regular expression for Table of Contents

I have to parse a document which contains Table Of Contents. The generated Document contains some text which is not part of table of content eg header and footer.



2.1 some_text 100
2.1. some_text 100
some_text 100

I have written one regex for validating whether the text is part of table of content.


(\d+(\.\d*)?)(.*)(\d{1,3})

But, it passed all the above text. I want it to failed in 3rd text ie some_text 100.

Please help.

You need to use an anchor ^ in multiline mode (start of line):

(?m)^(\d+(\.\d*)?)(.*)(\d{1,3})

See demo

You might even want to check if the number is at the end of the line with the $ anchor:

(?m)^\d+(?:\.\d*)?.*\d{1,3}$

Note I removed all capturing groups from the last regex to keep it clean. If you plan to use the captured texts, you can revert them.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM