简体   繁体   English

从新行之前的字符串中提取单词

[英]Extract words from string before new line

I recently asked the question how to extract words from string before number, to help me sort some data.我最近问了如何从数字之前的字符串中提取单词的问题,以帮助我对一些数据进行排序。 This works perfectly until there is no number in front and only a new line.这可以完美地工作,直到前面没有数字并且只有一个新行。

This was done by codenewbie这是由 codenewbie 完成的

import re

strings = '''
Hi my name is hazza 50 test test test

Hi hazza 60 test test test

hazza 50 test test test
'''

for s in strings.split('\n'):
    if s != '':
        print(re.findall('(.+?)\d',s)[0])

This gives这给

Hi my name is hazza 
Hi hazza 
hazza 

Which is perfect but fails if a string has no number in front but a new line这是完美的,但如果字符串前面没有数字而是换行,则失败

import re

strings = '''
Hi my name is hazza 50 test test test

Hi hazza 60 test test test

hazza 50 test test test

hazza hazza test test test
'''

for s in strings.split('\n'):
    if s != '':
        print(re.findall('(.+?)\d',s)[0])

I need it to give me我需要它给我

Hi my name is hazza 
Hi hazza 
hazza 
hazza hazza

I have tried我努力了

import re

strings = '''
Hi my name is hazza 50 test test test

Hi hazza 60 test test test

hazza 50 test test test

hazza hazza
test test test
'''

    while True:
            try:
                for s in strings.split('\n'):
                    if s != '':
                        print(re.findall('(.+?)\d',s)[0])
            except IndexError:
                print(s.split('/n'))

But not completely sure where to put the break in and if there is a better way但不完全确定在哪里插入以及是否有更好的方法

Any help would be greatly appreciated任何帮助将不胜感激

Edit:编辑:

I have these stings for example例如,我有这些刺痛

Hi my name is hazza 50 test test test

Hi hazza 60 test test test

hazza 50 test test test

hazza hazza
test test test

The code done by codenewbie works fine for the first three strings but not the last. codenewbie 完成的代码对前三个字符串工作正常,但对最后一个字符串不工作。

I need the last to look like我需要最后一个看起来像

Hi my name is hazza 
Hi hazza 
hazza 
hazza hazza

You can use re.match() [^\d]* to match any non-digit characters:您可以使用 re.match() [^\d]*匹配任何非数字字符:

import re

strings = '''
Hi my name is hazza 50 test test test

Hi hazza 60 test test test

hazza 50 test test test

hazza hazza test test test
'''

for s in strings.splitlines():
    if s != '':
        print(re.match(r'[^\d]*',s)[0])

Prints:印刷:

Hi my name is hazza 
Hi hazza 
hazza 
hazza hazza test test test

EDIT: Based on the comments, the new version:编辑:根据评论,新版本:

import re

strings = '''Hi my name is hazza 50 test test test

Hi hazza 60 test test test

hazza 50 test test test

hazza hazza
test test test
'''

for s in re.findall(r'(.*?)(?:\n\n|\n$)', strings, flags=re.S):
    print(re.match(r'(.*?)(?=\d|\n)', s)[0])

Prints:印刷:

Hi my name is hazza 
Hi hazza 
hazza 
hazza hazza

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM