简体   繁体   中英

Find all matches and replace iteratively with an index

I have the following problem:

I would like to mark the matches found with an index. Example:

x = "hayde hayde bim bam hayde hayde bim bam hayde hayde bim bbm ba bi bim"

I want to replace all the bim s and bam s with <1>, <2>, ..

Like this:

x = "hayde hayde <1> <2> hayde hayde <3> <4> hayde hayde <3> bbm ba bi <5>"

And get the output has a dict:

{"bim": "<1>"}
{"bam": "<2>"}
..

I think it is a simple problem but I cannot find the approach to solve this. I have to use the re module, to find the match, not the str.replace . This here is an abstract example for my problem

I suspect there may be another way to approach your true problem more directly, but try this:

Code

import collections as ct


def replace(s, subs):
    """Return a tuple of substitutes and a related dict."""
    dd = ct.defaultdict(list)
    replaced = []
    for i, word in enumerate(s.split()):
        if word in set(subs):
            pos = "<{}>".format(i)
            replaced.append(pos)
            dd[word].append(pos)
        else:
            replaced.append(word)
    return " ".join(replaced), dict(dd)

Demo +

x = "hayde hayde bim bam bimbam hayde hayde bim bam hayde hayde bim bbm ba bi bim"
replace(x, ["bim", "bam"])

Output

('hayde hayde <2> <3> bimbam hayde hayde <7> <8> hayde hayde <11> bbm ba bi <15>',
{'bim': ['<2>', '<7>', '<11>', '<15>'], 'bam': ['<3>', '<8>']})

You commented:

I need to know, where I cleaned up the strings.

Why not enumerate the split string with numbers that reflect actual indexed positions? The numeric substitutes in this example thereby represent index positions of the split string. You can easily swap them out with an incrementing counter if you wish.

+ Note: the test input slightly differs from the OP ("bimbam").

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM