This is my code:
results = re.finditer(r'([A-Z ?]+)\n+(.*)\n',inputfile,flags=re.MULTILINE)
for match in results:
print match.groups()
i/p:
BASIC INFORMATION
Name: John
Phone No.: +91-9876543210
DOB: 21-10-1995
SKILL SET
Java
Python
o/p: ('BASIC INFORMATION', 'Name: John') ('SKILL SET', 'Java')
But required o/p: ('BASIC INFORMATION', 'Name: John', 'Phone No.: +91-9876543210', 'DOB': '21-10-1995') ('SKILL SET', 'Java',' Python')
Replace re.MULTILINE
with re.DOTALL
so that your .*
matches across multiple lines (yes, the flag names are somewhat misleading). You'll also want to split your resulting strings on \\n
.
And generally, probably using regexp for this task is not the best idea, this should be better:
import string
results = []
for line in inputfile.splitlines():
if all(c in (string.ascii_uppercase + ' ') for c in line):
results.append([ line ])
elif line != '':
results[-1].append(line)
It is tough to get all output with regex cause your file text is not simple.
But regex + little extra effort and you can achive this easily
# This regex fetch all Titles (i.e. BASIC INFO, SKILL SET...)
results = re.findall(r"([A-Z ]{4,})", inputfile)
And After little work will help you to get your desired result
items=[]
for z in results:
item = inputfile[:inputfile.index(z)]
inputfile = inputfile.replace(item,'')
if item:
items.append(filter(str,item.split('\n')))
items.append(filter(str,inputfile.split('\n')))
print items
OUTPUT :
[ ['BASIC INFORMATION', 'Name: John', 'Phone No.: +91-9876543210', 'DOB': '21-10-1995'],
['SKILL SET', 'Java',' Python']
]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.