简体   繁体   中英

What is the best way to get rid of common substring prefix in Python3?

Let's assume we have string and a list of strings:

String:

  • str1 = <common-part>

List of strings:

[<common-part>-<random-text-a>, <common-part>-<random-text-b>]

What is the best (in case of readability and code-purity) to get such a list:

[<random-text-a>, <random-text-b>]

I would compute the common prefix of all strings using os.path.commonprefix , then slice the strings to remove that prefix (this function is in os.path module but doesn't check path separators, it's useable in a generic context):

import os

p = ["<common-part>-<some-text-a>", "<common-part>-<random-text-b>"]
commonprefix = os.path.commonprefix(p)

new_p = [x[len(commonprefix):] for x in p]

print(new_p)

result (since commonprefix is ""<common-part>-<" ):

['some-text-a>', 'random-text-b>']

notes:

  • this method allows a full dynamic prefix, not known in advance. With reversing the strings, it's also possible to remove the common suffix.
  • it's better to use len to slice the result instead of str.replace() : it's faster, and it only removes the start of the string, and safe since we know that all strings start by this prefix.

You can use list comprehensions, which are pretty pythonic:

[newstr.replace(str1, '', 1) for newstr in list_of_strings]

newstr.replace(str, '', 1) will only replace the first occurance of str1. Thanks to @ev-kounis for suggesting it

MyList = ["xxx-56", "xxx-57", "xxx-58"]
MyList = [x[len(prefix):] for x in MyList] # for each x in the list, 
                                 # this function will return x[len(prefix):] 
                                 # which is the string x minus the length of the prefix string

print(MyList)

---> ['56', '57', '58']

I would have done...

common = "Hello_"
lines = ["Hello_1 !", "Hello_2 !", "Hello_3 !"]

new_lines = []
for line in lines:
    # Finding first occurrence of the word we want to remove.
    startIndex = line.find(common) + len(common)
    new_lines.append(line[startIndex:])

print new_lines

Just testing performance with Jean-François Fabre since we're at it :

from timeit import timeit
import os

def test_fabre(lines):
    # import os

    commonprefix = os.path.commonprefix(lines)
    return [x[len(commonprefix):] for x in lines]

def test_insert(common, lines):
    new_lines = []
    for line in lines:
        startIndex = line.find(common) + len(common)
        new_lines.append(line[startIndex:])
    return new_lines

print timeit("test_insert(common, lines)", 'from __main__ import test_insert; common="Hello_";lines = ["Hello_1 !", "Hello_2 !", "Hello_3 !"]')
print timeit("test_fabre(lines)", 'from __main__ import test_fabre; lines = ["Hello_1 !", "Hello_2 !", "Hello_3 !"]')

# test_insert outputs : 2.92963575145
# test_fabre outputs : 4.23027790484 (with import os OUTside func)
# test_fabre outputs : 5.86552750264 (with import os INside func)
str1 = "hello"
list1 = ["hello1", "hello2", "hello3"]
list2 = []
for i in list1:
    list2.append(i.replace(str1,""))
print list2

this is the easiest way you can do.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM