[英]OCR and extracting text that follows a specific substring - regex using Python
I'm pretty new to Regex, and so I am sure I am missing something obvious, but need a hand with the following problem.我对正则表达式很陌生,所以我确信我遗漏了一些明显的东西,但需要帮助解决以下问题。
I want to extract the string(s) that follows on from a specific substring.我想从特定的 substring 中提取后面的字符串。 I am working off a list of scanned documents and have the following example string and I want to extract everything after "FORENAME"
我正在处理扫描文档的列表并具有以下示例字符串,我想提取“FORENAME”之后的所有内容
This is what I have done so far:这是我到目前为止所做的:
regex = r"(?<=(FORE))[A-Z]+"
test_str = 'UNIQUE NUMBER 12345 678910 11 FROM THIS DOCUMENT | . ISSUED ON 2011-04-04 FORENAME GUIDO \\ SURNAME VAN ROSSUM. '
matches = re.finditer(regex, test_str)
for matchNum, match in enumerate(matches, start=1):
print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
for groupNum in range(0, len(match.groups())):
groupNum = groupNum + 1
print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))
Which returns the following:它返回以下内容:
Match 1 was found at 78-82: NAME
Group 1 found at 74-78: FORE
What I want it to return is:我希望它返回的是:
GUIDO \ SURNAME VAN ROSSUM.
吉多\姓范罗苏姆。
Thanks!谢谢!
What I want it to return is:
我希望它返回的是:
GUIDO \ SURNAME VAN ROSSUM.
Based on the above, you can use:基于以上,您可以使用:
import re
test_str = 'UNIQUE NUMBER 12345 678910 11 FROM THIS DOCUMENT | . ISSUED ON 2011-04-04 FORENAME GUIDO \\ SURNAME VAN ROSSUM.'
result = re.sub(r"^.*FORENAME(.*?)$", r"\1", test_str)
print(result)
# GUIDO \ SURNAME VAN ROSSUM.
You don't need regex for so simple problem你不需要正则表达式来解决这么简单的问题
test_str = 'UNIQUE NUMBER 12345 678910 11 FROM THIS DOCUMENT | . ISSUED ON 2011-04-04 FORENAME GUIDO \\ SURNAME VAN ROSSUM. '
pos = test_str.find("FORENAME") + len("FORENAME")
print(test_str[pos:])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.