简体   繁体   中英

Searching multiple repeating patterns of text using regular exressions

I am trying to search for texts from a document, which have repeating portions and occur multiple times in the document. However, using the regex.match, it shows only the first match from the document and not others.

The patterns which I want to search looks like:

clauses 5.3, 12 & 15
clause 10 C, 10 CA & 10 CC

The following line shows the regular expression which I am using.

regex_crossref_multiple_1=r'(clause|Clause|clauses|Clauses)\s*\d+[.]?\d*\s*[a-zA-Z]*((,|&|and)\s*\d+[.]?\d*\s*[A-Z]*)+'

The code used for matching and the results are shown below:

cross=regex.search(regex_crossref_multiple_1,des)

(des is string containing text)

For printing the results, I am using print(cross.group()) .

Result:

clauses 5.3, 12 & 15

However, there are other patterns as well in des which I am not getting in the result.

Please let me know what can be the problem.

The input string(des) is can be found from following link.

https://docs.google.com/document/d/1LPmYaD6VE724OYoXDGPfInvx8WTu5JfrTqTOIv8zAlg/edit?usp=sharing

In case, the contractor completes the work ahead of stipulated date of
completion or justified extended date of completion as determined
under clauses 5.3, 12 & 15, a bonus @ 0.5 % (zero point five per cent) of
the tendered value per month computed on per day basis, shall be
payable to the contractor, subject to a maximum limit of 2 % (two
percent) of the tendered value. Provided that justified time for extra
work shall be calculated on pro-rata basis as cost of extra work excluding
amount payable/ paid under clause 10 C, 10 CA & 10 CC  X stipulated
period /tendered value. The amount of bonus, if payable, shall be paid
along with final bill after completion of work. Provided always that
provision of the Clause 2A shall be applicable only when so provided in
‘Schedule F’

You could match clauses followed by an optional digits part and optional chars AZ and then use a repeating pattern to match the optional following comma and the digits.

For the last part of the pattern you can optionally match either a , , & or and followed by a digit and optional chars AZ.

\b[Cc]lauses?\s+\d+(?:\.\d+)?(?:\s*[A-Z]+)?(?:,\s+\d+(?:\.\d+)?(?:\s*[A-Z]+)?)*(?:\s+(?:[,&]|and)\s+\d+(?:\.\d+)?(?:\s*[A-Z]+)?)?\b

Explanation

  • \b Word boundary
  • [Cc]lauses?\s+\d+(?:\.\d+)? Match clauses followed by digits and optional decimal part
  • (?:\s*[AZ]+)? Optionally match whitespace chars and 1+ chars AZ
  • (?: Non capture group
    • ,\s+\d+(?:\.\d+)? Match a comma, digits and optional decimal part
    • (?:\s*[AZ]+)? Optionally match whitespace chars and 1+ chars AZ
  • )* Close group and repeat 0+ times
  • (?: Non capture group
    • \s+(?:[,&]|and) Match 1+ whitespace char and either , , & or and
    • \s+\d+(?:\.\d+)? Match 1+ whitespace chars, 1+ digits with an optional decimal part
    • (?:\s*[AZ]+)? Match optional whitespace chars and 1+ chars AZ
  • )? Close group and make optional
  • \b Word boundary

Regex demo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM