简体   繁体   English

如何删除多行字符串中的任何重复行

[英]How can I remove any repeated line in a multi-line string

I have a multilined string that has some repeated lines.我有一个多行字符串,其中包含一些重复的行。 I want to remove not just the repeated line, but also the "original" that is repeated.我不仅要删除重复的行,还要删除重复的“原始”。

I found some answers about removing just the repeated line, leaving the original but I didn't know how to adapt it, and when I tried I failed.我找到了一些关于仅删除重复行,保留原始行的答案,但我不知道如何调整它,当我尝试时失败了。

text = """<br/>
Somewhere in China there is a copy of this vid.<br/>
2 years ago<br/>
Not sure really<br/>
Aiur Productions<br/>
Aiur Productions<br/>
2 years ago<br/>
"""<br/>

lines_seen = set()  # holds lines already seen<br/>

for line in text:
       if line not in lines_seen:  # not a duplicate
            print(lines_seen.add(line))

I got several rows of "none".我有几行“无”。 As mentioned, the code above comes from a different question, where the asker wanted to remove repeated lines but leave the non-repeated ones and one version of the repeated ones.如前所述,上面的代码来自另一个问题,提问者想要删除重复的行,但留下非重复的行和重复行的一个版本。 What I want is output like this:我想要的是 output 像这样:

Somewhere in China there is a copy of this vid.在中国的某个地方有这个视频的副本。
Not sure really真的不确定

with all duplicated lines (eg "two years ago") removed so that only lines that were not repeated in the original are left.删除所有重复的行(例如“两年前”),以便只留下原始中未重复的行。

set.add() doesn't return anything. set.add()不返回任何内容。 When you try to print its return value, you thus get None .当您尝试打印其返回值时,您会得到None If you want to both print the line and put it into the set, you need to use two separate statements:如果您想同时打印该行并将其放入集合中,则需要使用两个单独的语句:

for line in text:
   if line not in lines_seen:  # not a duplicate
        print(line)
        lines_seen.add(line)

This will print every line once, in its first appearance.这将在第一次出现时打印每行一次。 If you want to print only the lines that are never duplicated, then I would recommend keeping a parallel list of lines that were never repeated:如果您只想打印从不重复的行,那么我建议保留一个从不重复的行的并行列表:

lines_seen = set()
unique_lines = list()
for line in text:
    if line not in lines_seen:
        lines_seen.add(line)
        unique_lines.append(line)
    elif line in unique_lines:
        unique_lines.remove(line)
# and then print all the lines that were not removed from unique_lines on their second appearance
# in the order that they first appeared
for line in unique_lines:
    print(line)
from collections import Counter, OrderedDict

class OrderedCounter(Counter, OrderedDict):
    'Counter that remembers the order elements are first encountered'

    def __repr__(self):
        return '%s(%r)' % (self.__class__.__name__, OrderedDict(self))

    def __reduce__(self):
        return self.__class__, (OrderedDict(self),)



updated = []

for k,v in OrderedCounter(text.split('<br/>')).items():
    if v == 1:
        updated.append(k)

print('<br/>'.join(updated))

You can solve your problem using this approach:您可以使用这种方法解决您的问题:

  1. count unique lines occurrence in the text;计算文本中出现的唯一行;
  2. select lines occuring only once. select 行仅出现一次。

from collections import Counter

text = """<br/>
Somewhere in China there is a copy of this vid.<br/>
2 years ago<br/>
Not sure really<br/>
Aiur Productions<br/>
Aiur Productions<br/>
2 years ago<br/>
"""

str_counts = Counter(text.replace('<br/>', '').split('\n'))
result = '\n'.join([elem for elem in str_counts if str_counts[elem] == 1])

print(result)
# Somewhere in China there is a copy of this vid.
# Not sure really

I am not 100% sure what you are asking but I think that you want to print out all the lines but not the ones that are repeated more than once.我不是 100% 确定你在问什么,但我认为你想打印出所有的行,而不是那些重复多次的行。

lines = []
delete = []
for line in text.split("\n"):
    if line in lines:
        if lines.index(line) not in delete:
            delete.append(line)
    else:
        lines.append(line)
[lines.pop(x) for x in delete]

This code isn't perfect but should convey the idea这段代码并不完美,但应该传达这个想法

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM