简体   繁体   中英

Extract words from string before new line

I recently asked the question how to extract words from string before number, to help me sort some data. This works perfectly until there is no number in front and only a new line.

This was done by codenewbie

import re

strings = '''
Hi my name is hazza 50 test test test

Hi hazza 60 test test test

hazza 50 test test test
'''

for s in strings.split('\n'):
    if s != '':
        print(re.findall('(.+?)\d',s)[0])

This gives

Hi my name is hazza 
Hi hazza 
hazza 

Which is perfect but fails if a string has no number in front but a new line

import re

strings = '''
Hi my name is hazza 50 test test test

Hi hazza 60 test test test

hazza 50 test test test

hazza hazza test test test
'''

for s in strings.split('\n'):
    if s != '':
        print(re.findall('(.+?)\d',s)[0])

I need it to give me

Hi my name is hazza 
Hi hazza 
hazza 
hazza hazza

I have tried

import re

strings = '''
Hi my name is hazza 50 test test test

Hi hazza 60 test test test

hazza 50 test test test

hazza hazza
test test test
'''

    while True:
            try:
                for s in strings.split('\n'):
                    if s != '':
                        print(re.findall('(.+?)\d',s)[0])
            except IndexError:
                print(s.split('/n'))

But not completely sure where to put the break in and if there is a better way

Any help would be greatly appreciated

Edit:

I have these stings for example

Hi my name is hazza 50 test test test

Hi hazza 60 test test test

hazza 50 test test test

hazza hazza
test test test

The code done by codenewbie works fine for the first three strings but not the last.

I need the last to look like

Hi my name is hazza 
Hi hazza 
hazza 
hazza hazza

You can use re.match() [^\d]* to match any non-digit characters:

import re

strings = '''
Hi my name is hazza 50 test test test

Hi hazza 60 test test test

hazza 50 test test test

hazza hazza test test test
'''

for s in strings.splitlines():
    if s != '':
        print(re.match(r'[^\d]*',s)[0])

Prints:

Hi my name is hazza 
Hi hazza 
hazza 
hazza hazza test test test

EDIT: Based on the comments, the new version:

import re

strings = '''Hi my name is hazza 50 test test test

Hi hazza 60 test test test

hazza 50 test test test

hazza hazza
test test test
'''

for s in re.findall(r'(.*?)(?:\n\n|\n$)', strings, flags=re.S):
    print(re.match(r'(.*?)(?=\d|\n)', s)[0])

Prints:

Hi my name is hazza 
Hi hazza 
hazza 
hazza hazza

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM