简体   繁体   中英

How can I remove any repeated line in a multi-line string

I have a multilined string that has some repeated lines. I want to remove not just the repeated line, but also the "original" that is repeated.

I found some answers about removing just the repeated line, leaving the original but I didn't know how to adapt it, and when I tried I failed.

text = """<br/>
Somewhere in China there is a copy of this vid.<br/>
2 years ago<br/>
Not sure really<br/>
Aiur Productions<br/>
Aiur Productions<br/>
2 years ago<br/>
"""<br/>

lines_seen = set()  # holds lines already seen<br/>

for line in text:
       if line not in lines_seen:  # not a duplicate
            print(lines_seen.add(line))

I got several rows of "none". As mentioned, the code above comes from a different question, where the asker wanted to remove repeated lines but leave the non-repeated ones and one version of the repeated ones. What I want is output like this:

Somewhere in China there is a copy of this vid.
Not sure really

with all duplicated lines (eg "two years ago") removed so that only lines that were not repeated in the original are left.

set.add() doesn't return anything. When you try to print its return value, you thus get None . If you want to both print the line and put it into the set, you need to use two separate statements:

for line in text:
   if line not in lines_seen:  # not a duplicate
        print(line)
        lines_seen.add(line)

This will print every line once, in its first appearance. If you want to print only the lines that are never duplicated, then I would recommend keeping a parallel list of lines that were never repeated:

lines_seen = set()
unique_lines = list()
for line in text:
    if line not in lines_seen:
        lines_seen.add(line)
        unique_lines.append(line)
    elif line in unique_lines:
        unique_lines.remove(line)
# and then print all the lines that were not removed from unique_lines on their second appearance
# in the order that they first appeared
for line in unique_lines:
    print(line)
from collections import Counter, OrderedDict

class OrderedCounter(Counter, OrderedDict):
    'Counter that remembers the order elements are first encountered'

    def __repr__(self):
        return '%s(%r)' % (self.__class__.__name__, OrderedDict(self))

    def __reduce__(self):
        return self.__class__, (OrderedDict(self),)



updated = []

for k,v in OrderedCounter(text.split('<br/>')).items():
    if v == 1:
        updated.append(k)

print('<br/>'.join(updated))

You can solve your problem using this approach:

  1. count unique lines occurrence in the text;
  2. select lines occuring only once.

from collections import Counter

text = """<br/>
Somewhere in China there is a copy of this vid.<br/>
2 years ago<br/>
Not sure really<br/>
Aiur Productions<br/>
Aiur Productions<br/>
2 years ago<br/>
"""

str_counts = Counter(text.replace('<br/>', '').split('\n'))
result = '\n'.join([elem for elem in str_counts if str_counts[elem] == 1])

print(result)
# Somewhere in China there is a copy of this vid.
# Not sure really

I am not 100% sure what you are asking but I think that you want to print out all the lines but not the ones that are repeated more than once.

lines = []
delete = []
for line in text.split("\n"):
    if line in lines:
        if lines.index(line) not in delete:
            delete.append(line)
    else:
        lines.append(line)
[lines.pop(x) for x in delete]

This code isn't perfect but should convey the idea

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM