简体   繁体   中英

How do I extract the span and match from a regex search?

Suppose I have the following data:

some_string = """
Dave Martin
615-555-7164
173 Main St., Springfield RI 559241122
davemartin101@exampledomain.com

Charles Harris
800-555-5669
969 High St., Atlantis VA 340750509
charlesharris101@exampledomain.com
"""

I used the following to find a pattern:

import re
pattern = re.compile(r'\d\d\d(-|\.)\d\d\d(-|\.)\d\d\d\d')
matches = pattern.finditer(some_string)

Printing the re object shows:

for match in matches:
    print(match)

<re.Match object; span=(21, 33), match='615-555-7164'>
<re.Match object; span=(131, 143), match='800-555-5669'>

I want to extract the span and match fields. I found this link Extract part of a regex match that shows how to use group() :

nums = []
for match in matches:
    nums.append(match.group(0))

I get the following result:

print(nums)
['615-555-7164', '800-555-5669']

Similar to the other StackOverlow thread above, how can I extract the span?

This thread was marked for deletion by someone and then it was deleted. The justification for deletion was that I was seeking advice on software... which I was not. https://i.imgur.com/sbCfekf.png

If you are just looking for the tuple storing the begin and end index of the matches, just use span . Note that the parameter for span works the same way as group as they both take the match group index, and index 0 stores the entire match (while in your case index 1 and 2 match (-|\.) ).

for match in matches:
    print(match.span(0))

Output:

(13, 25)
(103, 115)

And for extracting the match fields, yes, your approach works just fine. It will be better if you extract both the match fields and span in the same loop.

nums = []
spans = []
for match in matches:
    nums.append(match.group(0))
    spans.append(match.span(0))

Besides, please be aware that finditer gives you an Iterator , which means that once it reaches the end of the iterable, it's done. You will need to create a new one if you want to iterate it through again.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM