How do I form separate blocks using regular expressions in python?

Question

This is my code:

results = re.finditer(r'([A-Z ?]+)\n+(.*)\n',inputfile,flags=re.MULTILINE)

for match in results:

    print match.groups()

i/p:

BASIC INFORMATION

Name: John

Phone No.: +91-9876543210

DOB: 21-10-1995

SKILL SET

Java

Python

o/p: ('BASIC INFORMATION', 'Name: John') ('SKILL SET', 'Java')

But required o/p: ('BASIC INFORMATION', 'Name: John', 'Phone No.: +91-9876543210', 'DOB': '21-10-1995') ('SKILL SET', 'Java',' Python')

Answer 1

Replace re.MULTILINE with re.DOTALL so that your .* matches across multiple lines (yes, the flag names are somewhat misleading). You'll also want to split your resulting strings on \\n .

And generally, probably using regexp for this task is not the best idea, this should be better:

import string
results = []
for line in inputfile.splitlines():
  if all(c in (string.ascii_uppercase + ' ') for c in line):
    results.append([ line ])
  elif line != '':
    results[-1].append(line)

Answer 2

It is tough to get all output with regex cause your file text is not simple.

But regex + little extra effort and you can achive this easily

# This regex fetch all Titles (i.e. BASIC INFO, SKILL SET...)
results = re.findall(r"([A-Z ]{4,})", inputfile)

And After little work will help you to get your desired result

items=[]
for z in results:
    item = inputfile[:inputfile.index(z)]
    inputfile = inputfile.replace(item,'')
    if item:
      items.append(filter(str,item.split('\n')))
items.append(filter(str,inputfile.split('\n')))
print items

OUTPUT :
[ ['BASIC INFORMATION', 'Name: John', 'Phone No.: +91-9876543210', 'DOB': '21-10-1995'],
['SKILL SET', 'Java',' Python']
]

How do I form separate blocks using regular expressions in python?

Question

2 answers

solution1
0 2017-06-14 10:09:41

solution2
0 ACCPTED 2017-06-14 10:28:21

How do I form separate blocks using regular expressions in python?

Question

2 answers

solution1 0 2017-06-14 10:09:41

solution2 0 ACCPTED 2017-06-14 10:28:21

solution1
0 2017-06-14 10:09:41

solution2
0 ACCPTED 2017-06-14 10:28:21