如何刪除多行字符串中的任何重復行

Question

我有一個多行字符串，其中包含一些重復的行。 我不僅要刪除重復的行，還要刪除重復的“原始”。

我找到了一些關於僅刪除重復行，保留原始行的答案，但我不知道如何調整它，當我嘗試時失敗了。

text = """<br/>
Somewhere in China there is a copy of this vid.<br/>
2 years ago<br/>
Not sure really<br/>
Aiur Productions<br/>
Aiur Productions<br/>
2 years ago<br/>
"""<br/>

lines_seen = set()  # holds lines already seen<br/>

for line in text:
       if line not in lines_seen:  # not a duplicate
            print(lines_seen.add(line))

我有幾行“無”。 如前所述，上面的代碼來自另一個問題，提問者想要刪除重復的行，但留下非重復的行和重復行的一個版本。 我想要的是 output 像這樣：

在中國的某個地方有這個視頻的副本。
真的不確定

刪除所有重復的行（例如“兩年前”），以便只留下原始中未重復的行。

Answer 1

set.add()不返回任何內容。 當您嘗試打印其返回值時，您會得到None 。 如果您想同時打印該行並將其放入集合中，則需要使用兩個單獨的語句：

for line in text:
   if line not in lines_seen:  # not a duplicate
        print(line)
        lines_seen.add(line)

這將在第一次出現時打印每行一次。 如果您只想打印從不重復的行，那么我建議保留一個從不重復的行的並行列表：

lines_seen = set()
unique_lines = list()
for line in text:
    if line not in lines_seen:
        lines_seen.add(line)
        unique_lines.append(line)
    elif line in unique_lines:
        unique_lines.remove(line)
# and then print all the lines that were not removed from unique_lines on their second appearance
# in the order that they first appeared
for line in unique_lines:
    print(line)

Answer 2

from collections import Counter, OrderedDict

class OrderedCounter(Counter, OrderedDict):
    'Counter that remembers the order elements are first encountered'

    def __repr__(self):
        return '%s(%r)' % (self.__class__.__name__, OrderedDict(self))

    def __reduce__(self):
        return self.__class__, (OrderedDict(self),)



updated = []

for k,v in OrderedCounter(text.split('<br/>')).items():
    if v == 1:
        updated.append(k)

print('<br/>'.join(updated))

Answer 3

您可以使用這種方法解決您的問題：

計算文本中出現的唯一行；
select 行僅出現一次。

from collections import Counter

text = """<br/>
Somewhere in China there is a copy of this vid.<br/>
2 years ago<br/>
Not sure really<br/>
Aiur Productions<br/>
Aiur Productions<br/>
2 years ago<br/>
"""

str_counts = Counter(text.replace('<br/>', '').split('\n'))
result = '\n'.join([elem for elem in str_counts if str_counts[elem] == 1])

print(result)
# Somewhere in China there is a copy of this vid.
# Not sure really

Answer 4

我不是 100% 確定你在問什么，但我認為你想打印出所有的行，而不是那些重復多次的行。

lines = []
delete = []
for line in text.split("\n"):
    if line in lines:
        if lines.index(line) not in delete:
            delete.append(line)
    else:
        lines.append(line)
[lines.pop(x) for x in delete]

這段代碼並不完美，但應該傳達這個想法

如何刪除多行字符串中的任何重復行

問題描述

4 個解決方案

解決方案1
1 2019-10-23 15:19:37

解決方案2
1 2019-10-23 16:14:45

解決方案3
1 2019-10-23 19:27:10

解決方案4
0 2019-10-23 15:25:46

如何刪除多行字符串中的任何重復行

問題描述

4 個解決方案

解決方案1 1 2019-10-23 15:19:37

解決方案2 1 2019-10-23 16:14:45

解決方案3 1 2019-10-23 19:27:10

解決方案4 0 2019-10-23 15:25:46

解決方案1
1 2019-10-23 15:19:37

解決方案2
1 2019-10-23 16:14:45

解決方案3
1 2019-10-23 19:27:10

解決方案4
0 2019-10-23 15:25:46