简体   繁体   中英

Python: Replacing a string with unique replacements

I'm reading a file and I need to replace certain empty tags ([[Image:]]).

The problem is every replacement has to be unique.

Here's the code:

import re
import codecs

re_imagematch = re.compile('(\[\[Image:([^\]]+)?\]\])')

wf = codecs.open('converted.wiki', "r", "utf-8")
wikilines = wf.readlines()
wf.close()

imgidx = 0
for i in range(0,len(wikilines)):
 if re_imagematch.search(wikilines[i]):
  print 'MATCH #######################################################'
  print wikilines[i]
  wikilines[i] = re_imagematch.sub('[[Image:%s_%s.%s]]' % ('outname', imgidx, 'extension'), wikilines[i])
  print wikilines[i]
  imgidx += 1

This does not work, as there can be many tags in one line:

Here's the input file.

[[Image:]][[Image:]]
[[Image:]]

This is what the output should look like:

[[Image:outname_0.extension]][Image:outname_1.extension]]
[[Image:outname_2.extension]]

This is what it currently looks likeö

[[Image:outname_0.extension]][Image:outname_0.extension]]
[[Image:outname_1.extension]]

I tried using a replacement function, the problem is this function gets only called once per line using re.sub.

You can use itertools.count here and take some advantage of the fact that default arguments are calculated when function is created and value of mutable default arguments can persist between function calls.

from itertools import count

def rep(m, cnt=count()):
    return '[[Image:%s_%s.%s]]' % ('outname', next(cnt) , 'extension')

This function will be invoked for each match found and it'll use a new value for each replacement.

So, you simply need to change this line in your code:

wikilines[i] = re_imagematch.sub(rep, wikilines[i])

Demo:

def rep(m, count=count()):
    return str(next(count))

>>> re.sub(r'a', rep, 'aaa')
'012'

To get the current counter value:

>>> from copy import copy
>>> next(copy(rep.__defaults__[0])) - 1
2

I'd use a simple string replacement wrapped in a while loop:

s = '[[Image:]][[Image:]]\n[[Image:]]'
pattern = '[[Image:]]'
i = 0
while s.find(pattern) >= 0:
    s = s.replace(pattern, '[[Image:outname_' + str(i) + '.extension]]', 1)
    i += 1
print s

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM