简体   繁体   中英

Find and replace the uppercase characters

i want to find and replace the uppercase characters (as _upperchar ) in a string.

Eg: input: HeLLo Capital Letters

output : _He_L_Lo _Capital _Letters

I tried like:

print "saran"
value = "HeLLo Capital Letters"
for word in value:
        print word
        if word.isupper():
                char = "_"
                value = value.replace(word,char + word)

print value

and the output I got is,

_He___L___Lo _Capital ___Letters

Some one please help me to reduce the extra underscores.

Take a look at re.sub

>>> import re
>>> re.sub(r'([A-Z])', r'_\1', value)
'_He_L_Lo _Capital _Letters'

The issue in your example isn't that you're modifying the string whilst iterating over it. Python will create iter(value) at the start of the for loop, and changes to value after this wont effect the loop due to strings being immutable. The problem is value.replace will replace all occurrences in the string, and as there are 3 capital Ls for example, each L will get 3 underscores ( value.replace('L', '_L') happens 3 times).

Just use str.join , add a _ before the ch if the ch/letter is uppercase, else just keep the letter/ch as is:

s=  "HeLLo Capital Letters"

print("".join(["_" + ch if ch.isupper() else ch for ch in s]))
_He_L_Lo _Capital _Letters

You run into issues because you are calling replace on the whole string each time so the repeated L's for example end up with three _ .

If you add a print value,word at the start of the loop you will see what happens:

HeLLo Capital Letters H
_HeLLo Capital Letters e
_HeLLo Capital Letters L
_He_LLo Capital Letters L # second L
_He__LLo Capital Letters o # after replacing twice we now have double _
 ........................

Some timings against a regex shows a list comp is the best approach:

In [13]: s = s * 50

In [14]: timeit "".join(["_" + ch if ch.isupper() else ch for ch in s])
10000 loops, best of 3: 98.9 µs per loop

In [15]: timeit  r.sub( r'_\1', s)
1000 loops, best of 3: 296 µs per loop

Look closely what's happening as your code is executed. I've added some "print" statements that show what's going on:

Replacing 'H' with '_H':
    _HeLLo Capital Letters

Replacing 'L' with '_L':
    _He_L_Lo Capital _Letters

Replacing 'L' with '_L':
    _He__L__Lo Capital __Letters

Replacing 'C' with '_C':
    _He__L__Lo _Capital __Letters

Replacing 'L' with '_L':
    _He___L___Lo _Capital ___Letters

You run into multiple L characters, and perform the replacement L_L for each of them, so you get:

L_L__L___L → ...

The other solutions here apply the replacement ( L_L ) on a character level, instead of on the whole string; that's why they work while yours doesn't.

The problem in your snippet is that when the first time you change H to _H, the next time you iterate, it considers H again because now it is in the second spot ! hence instead of replacing, just create a new string.

value = "HeLLo Capital Letters"
new_value = ""
for word in value:
        #print(word)
        if word.isupper():
                char = "_"
                new_value += char + word
        else:
            new_value += word

print(new_value) 

if an uppercase char is encountered, first condition is executed otherwise the lowercase char is simply appended

print "saran"
value = "HeLLo Capital Letters"
print ''.join(['_'+ x if x.isupper() else x for x in value])
value = "HELLO Capital Letters"         
for word in value:                      
    str = ""                            
    if word.isupper():                  
        val = word                      
    output=word.replace(val, "_"+word)  
    str = str + output                  
    print str                           

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM