Convert numbers in string to `_NUM-*_` symbol

Question

Given a string with numbers:

I counted, ' 1 2 3 4 5 5 5 8 9 10 '

The goal is to convert the numbers to the _NUM-*_ symbol where the * denotes the order by which the number occurs. Eg given the above intpu the desired output is:

"I counted, ' _NUM-1_ _NUM-2_ _NUM-3_ _NUM-4_ _NUM-5_ _NUM-6_ _NUM-7_ _NUM-8_ _NUM-9_ _NUM-10_'"

Even if are repeated numbers, eg given the input

I said, ' 1 2 3 4 5 5 5 8 9 10 '

the desired output keeps the order of the number ignoring the value of the number itself eg:

"I said, ' _NUM-1_ _NUM-2_ _NUM-3_ _NUM-4_ _NUM-5_ _NUM-6_ _NUM-7_ _NUM-8_ _NUM-9_ _NUM-10_'"

I've tried:

import re

s = "I counted, ' 1 2 3 4 5 6 7 8 9 10 '"
num_regexp = '(?<!\S)(?=.)(0|([1-9](\d*|\d{0,2}(,\d{3})*)))?(\.\d*[1-9])?(?!\S)'


re.sub(num_regexp, '_NUM_', s)

But it simply replaced the outputs with the same _NUM_ symbol without keeping the order, ie

[out]:

"I counted, ' _NUM_ _NUM_ _NUM_ _NUM_ _NUM_ _NUM_ _NUM_ _NUM_ _NUM_ _NUM_ _NUM_ '"

I could do a post re.sub operation and replace each _NUM_ , ie

import re

s = "I counted, ' 1 2 3 4 5 6 7 8 9 10 '"
num_regexp = '(?<!\S)(?=.)(0|([1-9](\d*|\d{0,2}(,\d{3})*)))?(\.\d*[1-9])?(?!\S)'

num_counter = 1
tokens = []
for token in re.sub(num_regexp, '_NUM_', s).split():
    if token == '_NUM_':
        token = '_NUM-{}_'.format(num_counter)
        num_counter += 1

    tokens.append(token)

result = ' '.join(tokens)

[out]:

"I counted, ' _NUM-1_ _NUM-2_ _NUM-3_ _NUM-4_ _NUM-5_ _NUM-6_ _NUM-7_ _NUM-8_ _NUM-9_ _NUM-10_ '"

Is a better way to achieve the desired output without first a generic re.sub and then a post-hoc string editing?

Answer 1

Use itertools.count as a default argument to the function passed to re.sub .

>>> from itertools import count

>>> re.sub('(\d+)', lambda m, c=count(1): '_NUM_-{}'.format(next(c)), s)
' _NUM_-1 _NUM_-2 _NUM_-3 _NUM_-4 _NUM_-5 _NUM_-6 _NUM_-7 _NUM_-8 _NUM_-9 _NUM_-10 '

Note that I am using a simplified regex for matching number just to demonstrate how to get the count, you could replace it with regex that matches floats as well.

Convert numbers in string to `_NUM-*_` symbol

Question

1 answers

solution1
2 ACCPTED 2017-08-11 02:47:17

Convert numbers in string to `_NUM-*_` symbol

Question

1 answers

solution1 2 ACCPTED 2017-08-11 02:47:17

solution1
2 ACCPTED 2017-08-11 02:47:17