简体   繁体   中英

How to remove characters after new line and space between words

I have list below

a = ['\\ntest_ dev\\n$', 'pro gra', 'test\\n', 'test\\n']

  • I need to remove space from in between elements and strip out after the \\n

  • I need to remove duplicate from the list

expected out is ['test_dev', 'progra', 'test']

Code is below

def remove_tags(text):
    tag_re = re.compile(r'<[^>]+>')
    remove_tag = tag_re.sub('', text)
    return remove_tag.replace(" ", "")
def remove_tags_newline(text):
    tag_re = re.compile(r'\n')
    remove_tag = tag_re.sub('', text)
    return remove_tag.replace(" ", "")
l = []
for i in a:
    s = remove_tags_newline(remove_tags(i))
    if s not in l:
        l.append(s)
l

My out is ['\\\\ntest_dev\\\\n$', 'progra', 'test'] expected out is ['test_dev', 'progra', 'test']

As you mentioned, you only have line feed chars in the input, not combinations of backslash and n .

In this case, you can fix your code by using

def remove_tags_newline(text):
    return "".join(re.sub('(?s)\n.*', '', text.strip()).split())

It does the following:

  • re.sub('(?s)\\n.*', '', text.strip()) - removes any leading/trailing whitespace chars and then removes any text after the first line feed char including it (note that (?s) is a re.S / re.DOTALL equivalent inline modifier that lets . match across lines, and \\n matches LF chars and .* matches any zero or more chars as many as possible)
  • .split() - splits the string with whitespace
  • "".join(...) - concats all the strings from the list into a single string without adding any delimiters between the items (thus, removes any whitespace together with .split() ).

See the Python demo :

import re
a = ['\ntest_ dev\n$', 'pro gra', 'test\n', 'test\n']
def remove_tags_newline(text):
    return "".join(re.sub('(?s)\n.*', '', text.strip()).split())
print( [remove_tags_newline(x) for x in a] )
# => ['test_dev', 'progra', 'test', 'test']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM