[英]Extract words from string before new line
I recently asked the question how to extract words from string before number, to help me sort some data.我最近问了如何从数字之前的字符串中提取单词的问题,以帮助我对一些数据进行排序。 This works perfectly until there is no number in front and only a new line.这可以完美地工作,直到前面没有数字并且只有一个新行。
This was done by codenewbie这是由 codenewbie 完成的
import re
strings = '''
Hi my name is hazza 50 test test test
Hi hazza 60 test test test
hazza 50 test test test
'''
for s in strings.split('\n'):
if s != '':
print(re.findall('(.+?)\d',s)[0])
This gives这给
Hi my name is hazza
Hi hazza
hazza
Which is perfect but fails if a string has no number in front but a new line这是完美的,但如果字符串前面没有数字而是换行,则失败
import re
strings = '''
Hi my name is hazza 50 test test test
Hi hazza 60 test test test
hazza 50 test test test
hazza hazza test test test
'''
for s in strings.split('\n'):
if s != '':
print(re.findall('(.+?)\d',s)[0])
I need it to give me我需要它给我
Hi my name is hazza
Hi hazza
hazza
hazza hazza
I have tried我努力了
import re
strings = '''
Hi my name is hazza 50 test test test
Hi hazza 60 test test test
hazza 50 test test test
hazza hazza
test test test
'''
while True:
try:
for s in strings.split('\n'):
if s != '':
print(re.findall('(.+?)\d',s)[0])
except IndexError:
print(s.split('/n'))
But not completely sure where to put the break in and if there is a better way但不完全确定在哪里插入以及是否有更好的方法
Any help would be greatly appreciated任何帮助将不胜感激
Edit:编辑:
I have these stings for example例如,我有这些刺痛
Hi my name is hazza 50 test test test
Hi hazza 60 test test test
hazza 50 test test test
hazza hazza
test test test
The code done by codenewbie works fine for the first three strings but not the last. codenewbie 完成的代码对前三个字符串工作正常,但对最后一个字符串不工作。
I need the last to look like我需要最后一个看起来像
Hi my name is hazza
Hi hazza
hazza
hazza hazza
You can use re.match() [^\d]*
to match any non-digit characters:您可以使用 re.match() [^\d]*
匹配任何非数字字符:
import re
strings = '''
Hi my name is hazza 50 test test test
Hi hazza 60 test test test
hazza 50 test test test
hazza hazza test test test
'''
for s in strings.splitlines():
if s != '':
print(re.match(r'[^\d]*',s)[0])
Prints:印刷:
Hi my name is hazza
Hi hazza
hazza
hazza hazza test test test
EDIT: Based on the comments, the new version:编辑:根据评论,新版本:
import re
strings = '''Hi my name is hazza 50 test test test
Hi hazza 60 test test test
hazza 50 test test test
hazza hazza
test test test
'''
for s in re.findall(r'(.*?)(?:\n\n|\n$)', strings, flags=re.S):
print(re.match(r'(.*?)(?=\d|\n)', s)[0])
Prints:印刷:
Hi my name is hazza
Hi hazza
hazza
hazza hazza
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.