简体   繁体   English

通过索引“同时”插入多个python子字符串

[英]Insert multiple python substrings by index 'at same time'

Suppose I have a string假设我有一个字符串

a = 'The dog in the street.' (so len(a)=8).
     01234567  (just adding indices for extra illustration)

Now I want to change that string to include some arbitrary words in arbitrary places, say, from the (arbitrarily sized) dict:现在我想更改该字符串以在任意位置包含一些任意单词,例如,来自(任意大小的)字典:

d = {
        'w1': {'begin':'0', 'end':'3', 'w':'BIG'}
        'w2': {'being':'4', 'end':'7', 'w':'BARKED
    }

where wx contains info about a word to insert, with the fields meaning:其中 wx 包含有关要插入的单词的信息,字段的含义为:

  • being: the start index of the word we want to insert after (inclusive)是:这个词的开始索引,我们要在其后插入(含)

  • end: the end index of the word we want to insert after (exclusive)结束:字的结束索引,我们要在其后插入(独家)

  • w: the word to insert w:要插入的单词

So 'applying' the dict d to string a, we would get:因此,将 dict d“应用”到字符串 a,我们将得到:

a = 'TheBIGdogBARKEDin the street.'
     0123456789...

Note that, though I have ordered the dictionary values here so that the words to be inserted are in left-to-right order, this is not always the case.请注意,虽然我在此处对字典值进行了排序,以便要插入的单词按从左到右的顺序排列,但情况并非总是如此。

I was initially trying to to do this with something like:我最初试图用类似的方法来做到这一点:

for word in d:
    insertion_loc = word['end']
    a = "{}{}{}".format(a[:insertion_loc], word['w'], a[insertion_loc:]) 

But when doing this, each iteration changes the total length of the string, so the begin and end indices no longer are applicable for the next word in the dict that wants to be inserted into the string.但是这样做时,每次迭代都会更改字符串的总长度,因此开始和结束索引不再适用于要插入字符串的字典中的下一个单词。 The only other way the immediately comes to mind is calculating new offsets for insertion based on the previously inserted substring(s) length(s) and whether the current string to be inserted is going to be inserted before or after the previously inserted substrings' locations (which seems like it would look a bit ugly).立即想到的唯一另一种方法是根据先前插入的子串长度以及要插入的当前字符串是在先前插入的子串位置之前还是之后插入来计算新的插入偏移量(这似乎看起来有点难看)。

Is there another way to do this?有没有另一种方法可以做到这一点? Thanks.谢谢。

您可以从末尾向前插入,这样您就不必考虑增加的索引

You can use re to find the characters that occur at d[word]['end'] and use str.format to replace those characters with the desired 'w' value:您可以使用re查找出现在d[word]['end']处的字符,并使用str.format将这些字符替换为所需的'w'值:

import re
s = "The dog.\n01234567"
d = {
    'w1': {'begin':'0', 'end':'3', 'w':'BIG'},
    'w2': {'being':'7', 'end':'7', 'w':'BARKED'}
}
final_s = re.sub('|'.join('\{}'.format(s[int(b['end'])]) for _, b in d.items()), "{}", s).format(*[c['w'] for _, c in sorted(d.items(), key=lambda x:int(x[0][-1]))])

Output:输出:

TheBIGdogBARKED
01234567

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM