简体   繁体   中英

Replacing Certain Parts of a String Python

I can not seem to solve this. I have many different strings, and they are always different. I need to replace the ends of them though, but they are always different lengths. Here is a example of a couple strings:

string1 = "thisisnumber1(111)"
string2 = "itsraining(22252)"
string3 = "fluffydog(3)"

Now when I print these out it will of course print the following:

thisisnumber1(111)
itsraining(22252)
fluffydog(3)

What I would like it to print though is the follow:

thisisnumber1
itsraining
fluffydog

I would like it to remove the part in the parentheses for each string, but I do not know how sense the lengths are always changing. Thank You

You can use str.rsplit for this:

>>> string1 = "thisisnumber1(111)"
>>> string2 = "itsraining(22252)"
>>> string3 = "fluffydog(3)"
>>>
>>> string1.rsplit("(")
['thisisnumber1', '111)']
>>> string1.rsplit("(")[0]
'thisisnumber1'
>>>
>>> string2.rsplit("(")
['itsraining', '22252)']
>>> string2.rsplit("(")[0]
'itsraining'
>>>
>>> string3.rsplit("(")
['fluffydog', '3)']
>>> string3.rsplit("(")[0]
'fluffydog'
>>>

str.rsplit splits the string from right-to-left rather than left-to-right like str.split . So, we split the string from right-to-left on ( and then retrieve the element at index 0 (the first element). This will be everything before the (...) at the end of each string.

Your other option is to use regular expressions, which can give you more precise control over what you want to get.

import re
regex = regex = r"(.+)\(\d+\)"

print re.match(regex, string1).groups()[0] #returns thisisnumber1
print re.match(regex, string2).groups()[0] #returns itsraining
print re.match(regex, string3).groups()[0] #returns fluffydog

Breakdown of what's happening:

regex = r"(.+)\\(\\d+\\)" is the regular expression, the formula for the string you're trying to find

.+ means match 1 or more character of any kind except newline

\\d+ means match 1 or more digit

\\( and \\) are the "(" and ")" characters

putting .+ in parentheses puts that string sequence in a group, meaning that group of characters is one that you want to be able to access later on. We don't put the sequence \\(\\d+\\) in a group because we don't care about those characters.

regex.match(regex, string1).groups() gives every substring in string1 that was part of a group. Since you only want 1 substring, you just access the 0th element.

There's a nice tutorial on regular expressions on Tutorial's Point here if you want to learn more.

Since you say in a comment:

"all that will be in the parentheses will be numbers"

so you'll always have digits between your parens, I'd recommend taking a look at removing them with the regular expression module:

import re

string1 = "thisisnumber1(111)"
string2 = "itsraining(22252)"
string3 = "fluffydog(3)"

strings = string1, string2, string3

for s in strings:
    s_replaced = re.sub(
        r'''
        \( # must escape the parens, since these are special characters in regex
        \d+ # one or more digits, 0-9
        \)
        ''', # this regular expression will be replaced by the next argument
        '', replace the above with an empty string
        s, # the string we're modifying
        re.VERBOSE) # verbose flag allows us to comment regex clearly
    print(s_replaced)

prints:

thisisnumber1
itsraining
fluffydog

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM