简体   繁体   English

Python-如何用连续数字替换所有出现的子字符串并将更改保存到主字符串?

[英]Python - How to replace all occurrences of a substring with consecutive number and save changes to main string?

I have a string 'companydocuments' inside a txt file. 我在txt文件中有一个字符串“ companydocuments”
I need to count all occurrences of given string and replace them with its corresponding consecutive number 我需要计算给定字符串的所有出现次数,并将其替换为其对应的连续数字
eg 'companydocuments' was found 405 times so each string has to be 'companydocuments1' , 'companydocuments2' , so on till the last one (405) and save changes to file. 例如,找到“ companydocuments” 405次,因此每个字符串必须是“ companydocuments1”“ companydocuments2” ,依此类推,直到最后一个(405),然后将更改保存到文件。
The aim is to use those strings as references further in the code to make or not certain operations. 目的是在代码中进一步使用那些字符串作为引用来进行或不进行某些操作。
My code does not work well but it changes all occurrences always with the last number 我的代码无法正常运行,但是它总是使用最后一个数字来更改所有出现的次数
eg 'companydocuments405' for each record and it does not save anything to file. 例如,每条记录都使用“ companydocuments405” ,它不会将任何内容保存到文件中。

#!/usr/bin/python
#Python 2.7.12

import re, os, string
with open('1.txt', 'r') as myfile:  
   lenght = myfile.read()
   a = lenght.count('COMPANYDOCUMENTS')
   a2 = re.findall('COMPANYDOCUMENTS', lenght)
   for i in range(a):
     string = 'COMPANYDOCUMENTS'
     b = [string + str(i) for i in range(a)]
     a2 = b[:]
     a3 = str(a2)
   content1 = lenght.replace('COMPANYDOCUMENTS', a3)
   myfile = open('1.txt', 'w')
   myfile.write(content1)
   myfile.close()

You can use re.sub with a replacement function that concatenates the match with a counter (using itertools.count ): 您可以将re.sub与替换函数一起使用,该替换函数将匹配项与计数器连接起来(使用itertools.count ):

from itertools import count
import re
lenght = 'abc companydocuments xyz companydocuments def companydocuments 123'
c = count(1)
print(re.sub('companydocuments', lambda m: m.group() + str(next(c)), lenght))

This outputs: 输出:

abc companydocuments1 xyz companydocuments2 def companydocuments3 123

There is a simpler way to do this. 有一种更简单的方法可以做到这一点。 First, let me go with a string: 首先,让我输入一个字符串:

>>> a = "ABCHCYEQCUWC"
>>> import re
>>> re.split('(C)', a)
['AB', 'C', 'H', 'C', 'YEQ', 'C', 'UW', 'C', '']

The re module has a split() function that is similar to string split() , except that if you put the regex in parentheses, you keep the separator. re模块具有与字符串split()类似的split()函数,不同之处在于,如果将正则表达式放在括号中,则保留分隔符。 So I leverage this feature to produce a list of tokens, such that every other token is the string you're interested (yours is "COMPANYDOCUMENTS", mine is "C"). 因此,我利用此功能来生成令牌列表,以便每个其他令牌都是您感兴趣的字符串(您的名称是“ COMPANYDOCUMENTS”,我的名称是“ C”)。 Now save it into a list: 现在将其保存到列表中:

>>> tokens = re.split('(C)', a)
>>> tokens[1::2]
['C', 'C', 'C', 'C']

So we want to modify this separators by appending a sequence number, which is easy in Python with enumerate() and list comprehension: 因此,我们想通过添加序列号来修改此分隔符,这在Python中使用enumerate()和列表理解很容易:

>>> [x+str(i+1) for i,x in enumerate(tokens[1::2])]
['C1', 'C2', 'C3', 'C4']

And now you can replace your tokenized string and rebuild the output string: 现在,您可以替换标记化的字符串并重建输出字符串:

>>> tokens[1::2] = [x+str(i+1) for i,x in enumerate(tokens[1::2])]
>>> tokens
['AB', 'C1', 'H', 'C2', 'YEQ', 'C3', 'UW', 'C4', '']
>>> "".join(tokens)
'ABC1HC2YEQC3UWC4'

Not the most efficient way but works: 不是最有效的方法,但是可以工作:

import string

readen = "sometext companydocument sometext companydocument ..."
delimiter = "companydocument"

result = ""
index = 0; # index will stay after every found of the delimiter

for i in readen.split(delimiter):
    index += 1
    # add the intermediate text (i), delimiter and index to the result
    result += i + delimiter + str(index)

# after the last item of the splitted list is the delimiter with an index not needed
# so remove it
result = result[ 0: -( len(str(index))  + len(delimiter) ) ]

# now is "sometext companydocument1 sometext companydocument2 ..." stored in result

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM