[英]If I could not iterate through Match.object, what should I do instead?
I am struggling with one function.我正在为一个 function 苦苦挣扎。 It is a function for scraping outlook, it finds a particular word "Number1" and selects numbers that are near this word.
它是一个 function 用于抓取outlook,它找到一个特定的单词“Number1”并选择该单词附近的数字。 Some of these numbers begin with "0" and I want to change it to "32" instead of "0" and save as a list.
其中一些数字以“0”开头,我想将其更改为“32”而不是“0”并保存为列表。
But I could not iterate through Match.object and I do not know any other possibilities how to achieve my goal.但是我无法遍历 Match.object 并且我不知道如何实现我的目标的任何其他可能性。
This is what I have tried:这是我尝试过的:
def get_number(file):
try:
body = file.body
matches = re.finditer(r"Number1:\s(.*)$", body, re.MULTILINE)
list_of_numbers = []
for match in matches:
for i in match.group(1):
if i[0] == 0:
list_of_numbers.append("32" + i[1:])
return list_of_numbers
except Exception as e:
print(e)
This is an example of typical email:这是典型的 email 示例:
Subject: Test1
Hi,
You got a new answer from user Alex.
Code: alex123fj
Number1: 0611111111
Number2: 1020
Number3: 3032
Your regex will find only Number1: ..
but there might be multiple numbers that starts with 0 as you stated in your question.您的正则表达式只会找到
Number1: ..
但可能有多个以 0 开头的数字,如您在问题中所述。 Here is the way to get the desired output:以下是获得所需 output 的方法:
import re
body = """
Subject: Test1
Hi,
You got a new answer from user Alex.
Code: alex123fj
Number1: 0611111111
Number2: 1020
Number3: 3032
"""
matches = re.findall(r"^Number.*$", body, re.MULTILINE)
# -> [Number1: 0611111111, Number2: 1020, Number3: 3032]
# getting only numbers from that list
nums = [ch for match in matches for ch in match.split() if ch.isdigit()]
# -> ['0611111111', '1020', '3032']
list_of_numbers = ['32' + i[1:] for i in nums if i[0] == '0']
# -> ['32611111111']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.