简体   繁体   中英

Find the position of the next occurrences in string

I'm new to python, trying to get familiar with regular expressions, and string processing. I have written a regular expression by which numbers throughout a string are identified and extracted to an array.

I want a parallel array which contains the positions of found terms.

To clarify, suppose that the main string is:

text = '11 scholars are selected to comptete on Feb 20 , 2019. 
Afterwards, 11 professors will review their submitted work. 
The results will be announced on Mar 20 , 2019.'

As I said, I can match nums = ['11', '20', '2019', '11', '20', '2019'] from the string above. Now, I want to form a synced array where stores the positions of each of these numbers. I'm using the following snippet:

positions = []
for num in nums:
   pos = text.find(num)
   positions.append(num + ' : ' + str(pos))

The positions array contains: positions = ['11 : 0', '20 : 44', '2019 : 49', '11 : 0', '20 : 44', '2019 : 49'] which obviously is not what I want. As there are duplicate numbers (like two 11's or 12's) in the list, text.find(num) returns the first occurrence of the term. So, when the program reaches the next occurrences of the token, it returns position of the first occurrence.

Any thoughts on how to fix this?

You can use finditer which returns an iterator yielding match objects, and you can get the matched string and the start position from these matches:

import re

text = """11 scholars are selected to comptete on Feb 20 , 2019. 
Afterwards, 11 professors will review their submitted work. 
The results will be announced on Mar 20 , 2019."""

[(m.group(0), m.start()) for m in re.finditer(r'\d+', text)]
# [('11', 0), ('20', 44), ('2019', 49), ('11', 68), ('20', 154), ('2019', 159)]

Or, if you want it formatted as in your question:

['{}: {}'.format(m.group(0), m.start()) for m in re.finditer(r'\d+', text)]
# ['11: 0', '20: 44', '2019: 49', '11: 68', '20: 154', '2019: 159']

@Thierry's approach is surely pythonic and makes good use of regular expressions. A simpler approach is as follows:

positions = []
i=0
for num in nums:
   pos = text.find(num, i)
   positions.append(num + ' : ' + str(pos))
   i =+ pos + len(num)

print(positions)
['11 : 0', '20 : 44', '2019 : 49', '11 : 68', '20 : 153', '2019 : 158']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM