regular expression for the extracting multiple patterns

Question

I have string like this

string="""Claim Status\r\n[Primary Status: Paidup to Rebilled]\r\nGeneral Info.\r\n[PA Number: #######]\r\nClaim Insurance: Modified\r\n[Ins. Mode: Primary], [Corrected Claim Checked], [ICN: #######], [Id: ########]"""

tokens=re.findall('(.*)\r\n(.*?:)(.*?])',string)

Output

 ('Claim Status', '[Primary Status:', ' Paidup to Rebilled]')
 ('General Info.', '[PA Number:', ' R180126187]')
 ('Claim Insurance: Modified', '[Ins. Mode:', ' Primary]')

Wanted output:

 ('Claim Status', 'Primary Status:Paidup to Rebilled')
 ('General Info.', 'PA Number:R180126187')
 ('Claim Insurance: Modified', 'Ins. Mode:Primary','ICN: ########', 'Id: #########')

Answer 1

You may achieve what you need with a solution like this:

import re
s="""Claim Status\r\n[Primary Status: Paidup to Rebilled]\r\nGeneral Info.\r\n[PA Number: #######]\r\nClaim Insurance: Modified\r\n[Ins. Mode: Primary], [Corrected Claim Checked], [ICN: #######], [Id: ########]"""
res = []
for m in re.finditer(r'^(.+)(?:\r?\n\s*\[(.+)])?\r?$', s, re.M):
    t = []
    t.append(m.group(1).strip())
    if m.group(2):
        t.extend([x.strip() for x in m.group(2).strip().split('], [') if ':' in x])
    res.append(tuple(t))
print(res)

See the Python online demo . Output:

[('Claim Status', 'Primary Status: Paidup to Rebilled'), ('General Info.', 'PA Number: #######'), ('Claim Insurance: Modified', 'Ins. Mode: Primary', 'ICN: #######', 'Id: ########')]

With the ^(.+)(?:\\r?\\n\\s*\\[(.+)])?\\r?$ regex, you match two consecutive lines with the second being optional (due to the (?:...)? optional non-capturing group), the first is captured into Group 1 and the subsequent one (that starts with [ and ends with ] ) is captured into Group 2. (Note that \\r?$ is necessary since in the multiline mode $ only matches before a newline and not a carriage return.) Group 1 value is added to a temporary list, then the contents of the second group is split with ], [ (if you are not sure about the amount of whitespace, you may use re.split(r']\\s*,\\s*\\[', m.group(2)) ) and then only add those items that contain a : in them to the temporary list.

Answer 2

You are getting three elements per result because you are using "capturing" regular expressions. Rewrite your regexp like this to combine the second and third match:

re.findall('(.*)\r\n((?:.*?:)(?:.*?]))',string)

A group delimited by (?:...) (instead of (...) ) is "non-capturing", ie it doesn't count as a match target for \\1 etc., and it does not get "seen" by re.findall . I have made both your groups non-capturing, and added a single capturing (regular) group around them.

regular expression for the extracting multiple patterns

Question

2 answers

solution1
2 ACCPTED 2018-09-03 08:36:58

solution2
0 2018-09-03 08:46:52

regular expression for the extracting multiple patterns

Question

2 answers

solution1 2 ACCPTED 2018-09-03 08:36:58

solution2 0 2018-09-03 08:46:52

solution1
2 ACCPTED 2018-09-03 08:36:58

solution2
0 2018-09-03 08:46:52