找到两个字符串之间交集的最佳方法是什么？

Question

I need to find the intersection between two strings. 我需要找到两个字符串之间的交集。 Assertions: 断言：

assert intersect("test", "tes") == list("tes"), "Assertion 1"
assert intersect("test", "ta") == list("t"), "Assertion 2"
assert intersect("foo", "fo") == list("fo"), "Assertion 3"
assert intersect("foobar", "foo") == list("foo"), "Assertion 4"

I tried different implementations for the intersect function. 我尝试了intersect函数的不同实现。 intersect would receive 2 str parameters, w and w2 intersect将接收2个str参数， w和w2

List comprehension . 列表理解 。 Iterate and look for occurrences in the second string. 迭代并查找第二个字符串中的出现次数。

return [l for l in w if l in w2]

Fail assertion 1 and 2 because multiple t in w match the one t in w2 失败断言1和2，因为在多个吨 w匹配在所述一个吨 w2

Sets intersections. 设置交叉点。

return list(set(w).intersection(w2)
return list(set(w) & set(w2))

Fails assertion 3 and 4 because a set is a collection of unique elements and duplicated letters will be discarded. 失败断言3和4因为集合是collection of unique elements的collection of unique elements并且将丢弃重复的字母。

Iterate and count. 迭代和数数。

out = ""
for c in s1:
    if c in s2 and not c in out:
        out += c
return out

Fails because it also eliminates duplicates. 失败，因为它也消除了重复。

difflib ( Python Documentation ) difflib （ Python文档）

letters_diff = difflib.ndiff(word, non_wildcards_letters)
letters_intersection = []

for l in letters_diff:
    letter_code, letter = l[:2], l[2:]
    if letter_code == "  ":
        letters_intersection.append(letter)

return letters_intersection

Passes 通行证

difflib works but can anybody think of a better, optimized approach? difflib有效，但任何人都可以想到更好的优化方法吗？

EDIT: The function would return a list of chars. 编辑：该函数将返回一个字符列表。 The order doesn't really matter. 订单并不重要。

Answer 1

Try this: 试试这个：

def intersect(string1, string2): 
    common = []
    for char in set(string1):
        common.extend(char * min(string1.count(char), string2.count(char)))

    return common

Note: It doesn't preserve the order (if I remember set() correctly, the letters will be returned in alphabetical order). 注意：它不保留顺序（如果我记得set()正确，字母将按字母顺序返回）。 But, as you say in your comments, order doesn't matter 但是，正如您在评论中所说，订单并不重要

Answer 2

This works pretty well for your test cases: 这适用于您的测试用例：

def intersect(haystack, needle):
    while needle:
        pos = haystack.find(needle)
        if pos >= 0:
            return list(needle)
        needle = needle[:-1]
    return []

But, bear in mind that, all your test cases are longer then shorter, do not have an empty search term, an empty search space, or a non-match. 但是，请记住，您的所有测试用例都会更长，然后更短，没有空搜索词，空搜索空间或不匹配。

Answer 3

Gives the number of co-occurrence for all n-grams in the two strings: 给出两个字符串中所有n-gram的共现次数：

from collections import Counter

def all_ngrams(text):
    ngrams = ( text[i:i+n] for n in range(1, len(text)+1)
                           for i in range(len(text)-n+1) )
    return Counter(ngrams)

def intersection(string1, string2):
    count_1 = all_ngrams(string1)
    count_2 = all_ngrams(string2)
    return count_1 & count_2   # intersection:  min(c[x], d[x])

intersection('foo', 'f') # Counter({'f': 1})
intersection('foo', 'o') # Counter({'o': 1})
intersection('foobar', 'foo') # Counter({'f': 1, 'fo': 1, 'foo': 1, 'o': 2, 'oo': 1})
intersection('abhab', 'abab') # Counter({'a': 2, 'ab': 2, 'b': 2})
intersection('achab', 'abac') # Counter({'a': 2, 'ab': 1, 'ac': 1, 'b': 1, 'c': 1})
intersection('test', 'ates') # Counter({'e': 1, 'es': 1, 's': 1, 't': 1, 'te': 1, 'tes': 1})

找到两个字符串之间交集的最佳方法是什么？

问题描述

3 个解决方案

解决方案1
2 已采纳 2018-08-08 15:05:29

解决方案2
1 2018-08-08 14:44:35

解决方案3
0 2018-08-08 15:53:38

找到两个字符串之间交集的最佳方法是什么？

问题描述

3 个解决方案

解决方案1 2 已采纳 2018-08-08 15:05:29

解决方案2 1 2018-08-08 14:44:35

解决方案3 0 2018-08-08 15:53:38

解决方案1
2 已采纳 2018-08-08 15:05:29

解决方案2
1 2018-08-08 14:44:35

解决方案3
0 2018-08-08 15:53:38