简体   繁体   中英

Convert numbers in string to `_NUM-*_` symbol

Given a string with numbers:

I counted, ' 1 2 3 4 5 5 5 8 9 10 '

The goal is to convert the numbers to the _NUM-*_ symbol where the * denotes the order by which the number occurs. Eg given the above intpu the desired output is:

"I counted, ' _NUM-1_ _NUM-2_ _NUM-3_ _NUM-4_ _NUM-5_ _NUM-6_ _NUM-7_ _NUM-8_ _NUM-9_ _NUM-10_'"

Even if are repeated numbers, eg given the input

I said, ' 1 2 3 4 5 5 5 8 9 10 '

the desired output keeps the order of the number ignoring the value of the number itself eg:

"I said, ' _NUM-1_ _NUM-2_ _NUM-3_ _NUM-4_ _NUM-5_ _NUM-6_ _NUM-7_ _NUM-8_ _NUM-9_ _NUM-10_'" 

I've tried:

import re

s = "I counted, ' 1 2 3 4 5 6 7 8 9 10 '"
num_regexp = '(?<!\S)(?=.)(0|([1-9](\d*|\d{0,2}(,\d{3})*)))?(\.\d*[1-9])?(?!\S)'


re.sub(num_regexp, '_NUM_', s)

But it simply replaced the outputs with the same _NUM_ symbol without keeping the order, ie

[out]:

"I counted, ' _NUM_ _NUM_ _NUM_ _NUM_ _NUM_ _NUM_ _NUM_ _NUM_ _NUM_ _NUM_ _NUM_ '"

I could do a post re.sub operation and replace each _NUM_ , ie

import re

s = "I counted, ' 1 2 3 4 5 6 7 8 9 10 '"
num_regexp = '(?<!\S)(?=.)(0|([1-9](\d*|\d{0,2}(,\d{3})*)))?(\.\d*[1-9])?(?!\S)'

num_counter = 1
tokens = []
for token in re.sub(num_regexp, '_NUM_', s).split():
    if token == '_NUM_':
        token = '_NUM-{}_'.format(num_counter)
        num_counter += 1

    tokens.append(token)

result = ' '.join(tokens)

[out]:

"I counted, ' _NUM-1_ _NUM-2_ _NUM-3_ _NUM-4_ _NUM-5_ _NUM-6_ _NUM-7_ _NUM-8_ _NUM-9_ _NUM-10_ '"

Is a better way to achieve the desired output without first a generic re.sub and then a post-hoc string editing?

Use itertools.count as a default argument to the function passed to re.sub .

>>> from itertools import count

>>> re.sub('(\d+)', lambda m, c=count(1): '_NUM_-{}'.format(next(c)), s)
' _NUM_-1 _NUM_-2 _NUM_-3 _NUM_-4 _NUM_-5 _NUM_-6 _NUM_-7 _NUM_-8 _NUM_-9 _NUM_-10 '

Note that I am using a simplified regex for matching number just to demonstrate how to get the count, you could replace it with regex that matches floats as well.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM