I have list below
a = ['\\ntest_ dev\\n$', 'pro gra', 'test\\n', 'test\\n']
I need to remove space
from in between elements and strip out after the \\n
I need to remove duplicate from the list
expected out is ['test_dev', 'progra', 'test']
Code is below
def remove_tags(text):
tag_re = re.compile(r'<[^>]+>')
remove_tag = tag_re.sub('', text)
return remove_tag.replace(" ", "")
def remove_tags_newline(text):
tag_re = re.compile(r'\n')
remove_tag = tag_re.sub('', text)
return remove_tag.replace(" ", "")
l = []
for i in a:
s = remove_tags_newline(remove_tags(i))
if s not in l:
l.append(s)
l
My out is ['\\\\ntest_dev\\\\n$', 'progra', 'test']
expected out is ['test_dev', 'progra', 'test']
As you mentioned, you only have line feed chars in the input, not combinations of backslash and n
.
In this case, you can fix your code by using
def remove_tags_newline(text):
return "".join(re.sub('(?s)\n.*', '', text.strip()).split())
It does the following:
re.sub('(?s)\\n.*', '', text.strip())
- removes any leading/trailing whitespace chars and then removes any text after the first line feed char including it (note that (?s)
is a re.S
/ re.DOTALL
equivalent inline modifier that lets .
match across lines, and \\n
matches LF chars and .*
matches any zero or more chars as many as possible) .split()
- splits the string with whitespace "".join(...)
- concats all the strings from the list into a single string without adding any delimiters between the items (thus, removes any whitespace together with .split()
). See the Python demo :
import re
a = ['\ntest_ dev\n$', 'pro gra', 'test\n', 'test\n']
def remove_tags_newline(text):
return "".join(re.sub('(?s)\n.*', '', text.strip()).split())
print( [remove_tags_newline(x) for x in a] )
# => ['test_dev', 'progra', 'test', 'test']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.