简体   繁体   English

从字符串中删除多个子字符串的最有效方法?

[英]Most efficient way to remove multiple substrings from string?

What's the most efficient method to remove a list of substrings from a string? 从字符串中删除子串列表的最有效方法是什么?

I'd like a cleaner, quicker way to do the following: 我想要一个更清洁,更快捷的方法来做到以下几点:

words = 'word1 word2 word3 word4, word5'
replace_list = ['word1', 'word3', 'word5']

def remove_multiple_strings(cur_string, replace_list):
  for cur_word in replace_list:
    cur_string = cur_string.replace(cur_word, '')
  return cur_string

remove_multiple_strings(words, replace_list)

Regex: 正则表达式:

>>> import re
>>> re.sub(r'|'.join(map(re.escape, replace_list)), '', words)
' word2  word4, '

The above one-liner is actually not as fast as your string.replace version, but definitely shorter: 上面的单string.replace实际上没有你的string.replace版本快,但肯定更短:

>>> words = ' '.join([hashlib.sha1(str(random.random())).hexdigest()[:10] for _ in xrange(10000)])
>>> replace_list = words.split()[:1000]
>>> random.shuffle(replace_list)
>>> %timeit remove_multiple_strings(words, replace_list)
10 loops, best of 3: 49.4 ms per loop
>>> %timeit re.sub(r'|'.join(map(re.escape, replace_list)), '', words)
1 loops, best of 3: 623 ms per loop

Gosh! 天哪! Almost 12x slower. 快了近12倍。

But can we improve it? 但我们可以改进它吗? Yes. 是。

As we are only concerned with words what we can do is simply filter out words from the words string using \\w+ and compare it against a set of replace_list (yes an actual set : set(replace_list) ): 因为我们只关心单词,我们可以做的只是使用\\w+过滤words字符串中的words ,并将其与一组replace_list (是实际setset(replace_list) )进行比较:

>>> def sub(m):
    return '' if m.group() in s else m.group()
>>> %%timeit
s = set(replace_list)
re.sub(r'\w+', sub, words)
...
100 loops, best of 3: 7.8 ms per loop

For even larger string and words the string.replace approach and my first solution will end up taking quadratic time, but the solution should run in linear time. 对于更大的字符串和单词, string.replace方法和我的第一个解决方案将最终采用二次时间,但解决方案应该以线性时间运行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM