How to remove characters after new line and space between words

Question

I have list below

a = ['\\ntest_ dev\\n$', 'pro gra', 'test\\n', 'test\\n']

I need to remove space from in between elements and strip out after the \\n
I need to remove duplicate from the list

expected out is ['test_dev', 'progra', 'test']

Code is below

def remove_tags(text):
    tag_re = re.compile(r'<[^>]+>')
    remove_tag = tag_re.sub('', text)
    return remove_tag.replace(" ", "")
def remove_tags_newline(text):
    tag_re = re.compile(r'\n')
    remove_tag = tag_re.sub('', text)
    return remove_tag.replace(" ", "")
l = []
for i in a:
    s = remove_tags_newline(remove_tags(i))
    if s not in l:
        l.append(s)
l

My out is ['\\\\ntest_dev\\\\n$', 'progra', 'test'] expected out is ['test_dev', 'progra', 'test']

Answer 1

As you mentioned, you only have line feed chars in the input, not combinations of backslash and n .

In this case, you can fix your code by using

def remove_tags_newline(text):
    return "".join(re.sub('(?s)\n.*', '', text.strip()).split())

It does the following:

re.sub('(?s)\\n.*', '', text.strip()) - removes any leading/trailing whitespace chars and then removes any text after the first line feed char including it (note that (?s) is a re.S / re.DOTALL equivalent inline modifier that lets . match across lines, and \\n matches LF chars and .* matches any zero or more chars as many as possible)
.split() - splits the string with whitespace
"".join(...) - concats all the strings from the list into a single string without adding any delimiters between the items (thus, removes any whitespace together with .split() ).

See the Python demo :

import re
a = ['\ntest_ dev\n$', 'pro gra', 'test\n', 'test\n']
def remove_tags_newline(text):
    return "".join(re.sub('(?s)\n.*', '', text.strip()).split())
print( [remove_tags_newline(x) for x in a] )
# => ['test_dev', 'progra', 'test', 'test']

How to remove characters after new line and space between words

Question

1 answers

solution1
1 ACCPTED 2021-06-16 22:23:34

How to remove characters after new line and space between words

Question

1 answers

solution1 1 ACCPTED 2021-06-16 22:23:34

solution1
1 ACCPTED 2021-06-16 22:23:34