简体   繁体   中英

replacing appointed characters in a string in txt file

Hello all…I want to pick up the texts 'DesingerXXX' from a text file which contains below contents:

C  DesignerTEE edBore 1 1/42006
Cylinder SingleVerticalB DesignerHHJ e 1 1/8Cooling 1
EngineBore 11/16 DesignerTDT 8Length 3Width 3
EngineCy DesignerHEE Inline2008Bore 1
Height 4TheChallen DesignerTET e 1Stroke 1P 305
Height 8C 606Wall15ccG DesignerQBG ccGasEngineJ 142
Height DesignerEQE C 60150ccGas2007

Anidea is to use the 'Designer' as a key, to consider each line into 2 parts, before the key, and after the key.

file_object = open('C:\\file.txt')
lines = file_object.readlines()

for line in lines:
    if 'Designer' in line:
        where = line.find('Designer')
        before = line[0:where]
        after = line[where:len(line)]

file_object.close()

In the 'before the key' part, I need to find the LAST space (' '), and replace to another symbol/character.

In the 'after the key' part, I need to find the FIRST space (' '), and replace to another symbol/character.

Then, I can slice it and pick up the wanted according to the new symbols/characters.

is there a better way to pick up the wanted texts? Or not, how can I replace the appointed key spaces?

In the string replace function, I can limit the times of replacing but not exactly which I can replace. How can I do that?

thanks

Using regular expressions, its a trivial task:

>>> s = '''C  DesignerTEE edBore 1 1/42006
... Cylinder SingleVerticalB DesignerHHJ e 1 1/8Cooling 1
... EngineBore 11/16 DesignerTDT 8Length 3Width 3
... EngineCy DesignerHEE Inline2008Bore 1
... Height 4TheChallen DesignerTET e 1Stroke 1P 305
... Height 8C 606Wall15ccG DesignerQBG ccGasEngineJ 142
... Height DesignerEQE C 60150ccGas2007'''
>>> import re
>>> exp = 'Designer[A-Z]{3}'
>>> re.findall(exp, s)
['DesignerTEE', 'DesignerHHJ', 'DesignerTDT', 'DesignerHEE', 'DesignerTET', 'DesignerQBG', 'DesignerEQE']

The regular expression is Designer[AZ]{3} which means the letters Designer , followed by any letter from capital A to capital Z that appears 3 times, and only three times.

So, it won't match DesignerABCD (4 letters), it also wont match Desginer123 (123 is not valid letters).

It also won't match Designerabc (abc are small letters). To make it ignore the case, you can pass an optional flag re.I as a third argument; but this will also match designerabc (you have to be very specific with regular expressions).

So, to make it so that it matches Designer followed by exactly 3 upper or lower case letters, you'd have to change the expression to Designer[Aa-zZ]{3} .

If you want to search and replace, then you can use re.sub for substituting matches; so if I want to replace all matches with the word 'hello':

>>> x = re.sub(exp, 'hello', s)
>>> print(x)
C  hello edBore 1 1/42006
Cylinder SingleVerticalB hello e 1 1/8Cooling 1
EngineBore 11/16 hello 8Length 3Width 3
EngineCy hello Inline2008Bore 1
Height 4TheChallen hello e 1Stroke 1P 305
Height 8C 606Wall15ccG hello ccGasEngineJ 142
Height hello C 60150ccGas2007

and what if both before and after 'Designer', there are characters, and the length of character is not fixed. I tried '[Aa-zZ]Designer[Aa-zZ]{0~9}', but it doesn't work..

For these things, there are special characters in regular expressions. Briefly summarized below:

  • When you want to say "1 or more, but at least 1", use +
  • When you want to say "0 or any number, but there maybe none", use *
  • When you want to say "none but if it exists, only repeats once" use ?

You use this after the expression you want to be modified with the "repetition" modifiers.

For more on this, have a read through the documentation .

Now your requirements is "there are characters but the length is not fixed " , based on this, we have to use + .

Try with re.sub . The regular expression match with your keyword surrounded by spaces. The second parameter of sub, replace the surrounder spaces by your_special_char (in my script a hyphen)

>>> import re
>>> with open('file.txt') as file_object:
...     your_special_char = '-'
...     for line in file_object:
...         formated_line = re.sub(r'(\s)(Designer[A-Z]{3})(\s)', r'%s\2%s' % (your_special_char,your_special_char), line)
...         print formated_line
... 
C -DesignerTEE-edBore 1 1/42006
Cylinder SingleVerticalB-DesignerHHJ-e 1 1/8Cooling 1
EngineBore 11/16-DesignerTDT-8Length 3Width 3
EngineCy-DesignerHEE-Inline2008Bore 1
Height 4TheChallen-DesignerTET-e 1Stroke 1P 305
Height 8C 606Wall15ccG-DesignerQBG-ccGasEngineJ 142
Height-DesignerEQE-C 60150ccGas2007

Maroun Maroun mentioned 'Why not simply split the string'. so guessing one of the working way is:

import re

file_object = open('C:\\file.txt')
lines = file_object.readlines()

b = []

for line in lines:
    a = line.split()
    for aa in a:
        b.append(aa)

for bb in b:
    if 'Designer' in bb:
        print bb

file_object.close()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM