如果第一個字符與列表中的另一個字符串元素匹配，則刪除字符串列表中的字符串元素

Question

我想查找和比較有效的列表中的字符串元素，然后刪除其是其他字符串元素的部件列表（具有相同的起點）

list1 = [ 'a boy ran' , 'green apples are worse' , 'a boy ran towards the mill' ,  ' this is another sentence ' , 'a boy ran towards the mill and fell',.....]

我打算得到一個如下所示的列表：

list2 = [  'green apples are worse' , ' this is another sentence ' , 'a boy ran towards the mill and fell',.....]

換句話說，我想保留那些以相同的第一個字符開頭的元素中最長的字符串元素。

Answer 1

這是你可以實現的一種方式： -

list1 = [ 'a boy ran' , 'green apples are worse' , 'a boy ran towards the mill' ,  ' this is another sentence ' , 'a boy ran towards the mill and fell']
list2 = []
for i in list1:
    bool = True
    for j in list1:
        if id(i) != id(j) and j.startswith(i): bool = False
    if bool: list2.append(i)
>>> list2
['green apples are worse', ' this is another sentence ', 'a boy ran towards the mill and fell']

Answer 2

正如John Coleman在評論中所建議的那樣，您可以先對句子進行排序，然后比較連續的句子。 如果一個句子是另一個句子的前綴，它將出現在排序列表中的句子之前，所以我們只需比較連續的句子。 要保留原始訂單，您可以使用一set來快速查找已過濾的元素。

list1 = ['a boy ran', 'green apples are worse', 
         'a boy ran towards the mill', ' this is another sentence ',
         'a boy ran towards the mill and fell']                                                                

srtd = sorted(list1)
filtered = set(list1)
for a, b in zip(srtd, srtd[1:]):
    if b.startswith(a):
        filtered.remove(a)

list2 = [x for x in list1 if x in filtered]

之后， list2如下：

['green apples are worse',
 ' this is another sentence ',
 'a boy ran towards the mill and fell']

使用O（nlogn），這比比較O（n²）中的所有句子對要快得多，但如果列表不是太長， Vicrobot的更簡單的解決方案也可以正常工作。

Answer 3

你如何處理關於如何處理['a','ab','ac','add']的問題的方式有些含糊不清。 我假設你想要['ab','ac','add'] 。

下面另外假設您沒有任何空字符串。 這不是一個好的假設。

基本上，我們正在從輸入值構建樹，並且只保留葉節點。 這可能是最復雜的方法。 我認為它有可能是最有效的 ，但我不確定 ~~這不是你要求的~~ 。

from collections import defaultdict
from itertools import groupby
from typing import Collection, Dict, Generator, Iterable, List, Union

# Exploded is a recursive data type representing a culled list of strings as a tree of character-by-character common prefixes. The leaves are the non-common suffixes.
Exploded = Dict[str, Union["Exploded", str]]

def explode(subject:Iterable[str])->Exploded:
    heads_to_tails = defaultdict(list)
    for s in subject:
        if s:
            heads_to_tails[s[0]].append(s[1:])
    return {
        head: prune_or_follow(tails)
        for (head, tails)
        in heads_to_tails.items()
    }

def prune_or_follow(tails: List[str]) -> Union[Exploded, str]:
    if 1 < len(tails):
        return explode(tails)
    else: #we just assume it's not empty.
        return tails[0]

def implode(tree: Exploded, prefix :Iterable[str] = ()) -> Generator[str, None, None]:
    for (head, continued) in tree.items():
        if isinstance(continued, str):
            yield ''.join((*prefix, head, continued))
        else:
            yield from implode(continued, (*prefix, head))

def cull(subject: Iterable[str]) -> Collection[str]:
    return list(implode(explode(subject)))

print(cull(['a','ab','ac','add']))
print(cull([ 'a boy ran' , 'green apples are worse' , 'a boy ran towards the mill' ,  ' this is another sentence ' , 'a boy ran towards the mill and fell']))
print(cull(['a', 'ab', 'ac', 'b', 'add']))

編輯：
我把一些電話弄平了，我希望通過這種方式更容易閱讀和推理。 令我煩惱的是，我無法弄清楚這個過程的運行時復雜性。 我認為它是O（nm），其中m是重疊前綴的長度，與字符串比較的O（nm log（n））相比...

編輯：
我在Code Review中啟動了另一個問題，希望有人可以幫助我弄清楚復雜性。 那里的某個人指出，所寫的代碼實際上並不起作用： groupby是對其名稱的任何合理解釋的垃圾。 我已經換掉了上面的代碼，並且這種方式也更容易閱讀。

編輯：
好的，我已經為CR導入了一些很好的建議。 在這一點上，我很確定我的運行時復雜性比基於排序的選項更好。

如果第一個字符與列表中的另一個字符串元素匹配，則刪除字符串列表中的字符串元素

問題描述

3 個解決方案

解決方案1
3 2019-06-24 13:15:39

解決方案2
3 已采納 2019-06-24 13:58:10

解決方案3
3 2019-06-24 14:44:10

如果第一個字符與列表中的另一個字符串元素匹配，則刪除字符串列表中的字符串元素

問題描述

3 個解決方案

解決方案1 3 2019-06-24 13:15:39

解決方案2 3 已采納 2019-06-24 13:58:10

解決方案3 3 2019-06-24 14:44:10

解決方案1
3 2019-06-24 13:15:39

解決方案2
3 已采納 2019-06-24 13:58:10

解決方案3
3 2019-06-24 14:44:10