How can I remove any repeated line in a multi-line string

Question

I have a multilined string that has some repeated lines. I want to remove not just the repeated line, but also the "original" that is repeated.

I found some answers about removing just the repeated line, leaving the original but I didn't know how to adapt it, and when I tried I failed.

text = """<br/>
Somewhere in China there is a copy of this vid.<br/>
2 years ago<br/>
Not sure really<br/>
Aiur Productions<br/>
Aiur Productions<br/>
2 years ago<br/>
"""<br/>

lines_seen = set()  # holds lines already seen<br/>

for line in text:
       if line not in lines_seen:  # not a duplicate
            print(lines_seen.add(line))

I got several rows of "none". As mentioned, the code above comes from a different question, where the asker wanted to remove repeated lines but leave the non-repeated ones and one version of the repeated ones. What I want is output like this:

Somewhere in China there is a copy of this vid.
Not sure really

with all duplicated lines (eg "two years ago") removed so that only lines that were not repeated in the original are left.

Answer 1

set.add() doesn't return anything. When you try to print its return value, you thus get None . If you want to both print the line and put it into the set, you need to use two separate statements:

for line in text:
   if line not in lines_seen:  # not a duplicate
        print(line)
        lines_seen.add(line)

This will print every line once, in its first appearance. If you want to print only the lines that are never duplicated, then I would recommend keeping a parallel list of lines that were never repeated:

lines_seen = set()
unique_lines = list()
for line in text:
    if line not in lines_seen:
        lines_seen.add(line)
        unique_lines.append(line)
    elif line in unique_lines:
        unique_lines.remove(line)
# and then print all the lines that were not removed from unique_lines on their second appearance
# in the order that they first appeared
for line in unique_lines:
    print(line)

Answer 2

from collections import Counter, OrderedDict

class OrderedCounter(Counter, OrderedDict):
    'Counter that remembers the order elements are first encountered'

    def __repr__(self):
        return '%s(%r)' % (self.__class__.__name__, OrderedDict(self))

    def __reduce__(self):
        return self.__class__, (OrderedDict(self),)



updated = []

for k,v in OrderedCounter(text.split('<br/>')).items():
    if v == 1:
        updated.append(k)

print('<br/>'.join(updated))

Answer 3

You can solve your problem using this approach:

count unique lines occurrence in the text;
select lines occuring only once.

from collections import Counter

text = """<br/>
Somewhere in China there is a copy of this vid.<br/>
2 years ago<br/>
Not sure really<br/>
Aiur Productions<br/>
Aiur Productions<br/>
2 years ago<br/>
"""

str_counts = Counter(text.replace('<br/>', '').split('\n'))
result = '\n'.join([elem for elem in str_counts if str_counts[elem] == 1])

print(result)
# Somewhere in China there is a copy of this vid.
# Not sure really

Answer 4

I am not 100% sure what you are asking but I think that you want to print out all the lines but not the ones that are repeated more than once.

lines = []
delete = []
for line in text.split("\n"):
    if line in lines:
        if lines.index(line) not in delete:
            delete.append(line)
    else:
        lines.append(line)
[lines.pop(x) for x in delete]

This code isn't perfect but should convey the idea

How can I remove any repeated line in a multi-line string

Question

4 answers

solution1
1 2019-10-23 15:19:37

solution2
1 2019-10-23 16:14:45

solution3
1 2019-10-23 19:27:10

solution4
0 2019-10-23 15:25:46

How can I remove any repeated line in a multi-line string

Question

4 answers

solution1 1 2019-10-23 15:19:37

solution2 1 2019-10-23 16:14:45

solution3 1 2019-10-23 19:27:10

solution4 0 2019-10-23 15:25:46

solution1
1 2019-10-23 15:19:37

solution2
1 2019-10-23 16:14:45

solution3
1 2019-10-23 19:27:10

solution4
0 2019-10-23 15:25:46