Given a string with numbers:
I counted, ' 1 2 3 4 5 5 5 8 9 10 '
The goal is to convert the numbers to the _NUM-*_
symbol where the *
denotes the order by which the number occurs. Eg given the above intpu the desired output is:
"I counted, ' _NUM-1_ _NUM-2_ _NUM-3_ _NUM-4_ _NUM-5_ _NUM-6_ _NUM-7_ _NUM-8_ _NUM-9_ _NUM-10_'"
Even if are repeated numbers, eg given the input
I said, ' 1 2 3 4 5 5 5 8 9 10 '
the desired output keeps the order of the number ignoring the value of the number itself eg:
"I said, ' _NUM-1_ _NUM-2_ _NUM-3_ _NUM-4_ _NUM-5_ _NUM-6_ _NUM-7_ _NUM-8_ _NUM-9_ _NUM-10_'"
I've tried:
import re
s = "I counted, ' 1 2 3 4 5 6 7 8 9 10 '"
num_regexp = '(?<!\S)(?=.)(0|([1-9](\d*|\d{0,2}(,\d{3})*)))?(\.\d*[1-9])?(?!\S)'
re.sub(num_regexp, '_NUM_', s)
But it simply replaced the outputs with the same _NUM_
symbol without keeping the order, ie
[out]:
"I counted, ' _NUM_ _NUM_ _NUM_ _NUM_ _NUM_ _NUM_ _NUM_ _NUM_ _NUM_ _NUM_ _NUM_ '"
I could do a post re.sub
operation and replace each _NUM_
, ie
import re
s = "I counted, ' 1 2 3 4 5 6 7 8 9 10 '"
num_regexp = '(?<!\S)(?=.)(0|([1-9](\d*|\d{0,2}(,\d{3})*)))?(\.\d*[1-9])?(?!\S)'
num_counter = 1
tokens = []
for token in re.sub(num_regexp, '_NUM_', s).split():
if token == '_NUM_':
token = '_NUM-{}_'.format(num_counter)
num_counter += 1
tokens.append(token)
result = ' '.join(tokens)
[out]:
"I counted, ' _NUM-1_ _NUM-2_ _NUM-3_ _NUM-4_ _NUM-5_ _NUM-6_ _NUM-7_ _NUM-8_ _NUM-9_ _NUM-10_ '"
Is a better way to achieve the desired output without first a generic re.sub
and then a post-hoc string editing?
Use itertools.count
as a default argument to the function passed to re.sub
.
>>> from itertools import count
>>> re.sub('(\d+)', lambda m, c=count(1): '_NUM_-{}'.format(next(c)), s)
' _NUM_-1 _NUM_-2 _NUM_-3 _NUM_-4 _NUM_-5 _NUM_-6 _NUM_-7 _NUM_-8 _NUM_-9 _NUM_-10 '
Note that I am using a simplified regex for matching number just to demonstrate how to get the count, you could replace it with regex that matches floats as well.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.