I recently asked the question how to extract words from string before number, to help me sort some data. This works perfectly until there is no number in front and only a new line.
This was done by codenewbie
import re
strings = '''
Hi my name is hazza 50 test test test
Hi hazza 60 test test test
hazza 50 test test test
'''
for s in strings.split('\n'):
if s != '':
print(re.findall('(.+?)\d',s)[0])
This gives
Hi my name is hazza
Hi hazza
hazza
Which is perfect but fails if a string has no number in front but a new line
import re
strings = '''
Hi my name is hazza 50 test test test
Hi hazza 60 test test test
hazza 50 test test test
hazza hazza test test test
'''
for s in strings.split('\n'):
if s != '':
print(re.findall('(.+?)\d',s)[0])
I need it to give me
Hi my name is hazza
Hi hazza
hazza
hazza hazza
I have tried
import re
strings = '''
Hi my name is hazza 50 test test test
Hi hazza 60 test test test
hazza 50 test test test
hazza hazza
test test test
'''
while True:
try:
for s in strings.split('\n'):
if s != '':
print(re.findall('(.+?)\d',s)[0])
except IndexError:
print(s.split('/n'))
But not completely sure where to put the break in and if there is a better way
Any help would be greatly appreciated
Edit:
I have these stings for example
Hi my name is hazza 50 test test test
Hi hazza 60 test test test
hazza 50 test test test
hazza hazza
test test test
The code done by codenewbie works fine for the first three strings but not the last.
I need the last to look like
Hi my name is hazza
Hi hazza
hazza
hazza hazza
You can use re.match() [^\d]*
to match any non-digit characters:
import re
strings = '''
Hi my name is hazza 50 test test test
Hi hazza 60 test test test
hazza 50 test test test
hazza hazza test test test
'''
for s in strings.splitlines():
if s != '':
print(re.match(r'[^\d]*',s)[0])
Prints:
Hi my name is hazza
Hi hazza
hazza
hazza hazza test test test
EDIT: Based on the comments, the new version:
import re
strings = '''Hi my name is hazza 50 test test test
Hi hazza 60 test test test
hazza 50 test test test
hazza hazza
test test test
'''
for s in re.findall(r'(.*?)(?:\n\n|\n$)', strings, flags=re.S):
print(re.match(r'(.*?)(?=\d|\n)', s)[0])
Prints:
Hi my name is hazza
Hi hazza
hazza
hazza hazza
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.