I have an array with words, some ending with special characters. I would like all the special characters at the end of the words to be deleted. Is there an elegant way to do it?
aArray=["palabra...","algo,.", "si ...", "onomatopeña", "asi;","www.google.com"]
output:
aArray=["palabra","algo", "si", "onomatopeña", "asi","www.google.com"]
I was trying this:
rxx = re.compile(r'(.*)([.,]{2,})') # Extend [.,] as needed; {2,} means >= 2
aArray=["encontarla....", "esta,.", "sr.", "texto", 'www.google.com', 'encontrarla.']
aArray=([rxx.sub(lambda m: m.group(1), word) for word in a])
I think I did not understand at all. For example the string www.google.com
as it is a url, should not elminate the dots.
You can use a regular expression to do that. Although your question is not very clear on the definition of 'special characters', but here is a sample code that gives the output that you posted:
import re
aArray=["palabra...","algo,.", "si ...", "onomatopeña", "asi;", "www.google.com"]
for i in range(len(aArray)):
aArray[i] = re.sub(r'[.,;]+$', '', aArray[i]).strip()
Output:
['palabra', 'algo', 'si', 'onomatopeña', 'asi', 'www.google.com']
If by 'special character' you mean any non-alphanumeric, then you can use this:
import re
aArray=["palabra...","algo,.", "si ...", "onomatopeña", "asi;", "www.google.com"]
for i in range(len(aArray)):
aArray[i] = re.sub(r'[^\w]+$', '', aArray[i]).strip()
Output:
['palabra', 'algo', 'si', 'onomatopeña', 'asi', 'www.google.com']
Also note the strip()
, it is there to remove the trailing spaces
UPDATE
The $
at the end of regular expressions, means that we expect this pattern to be at the end and nothing else should be after it. So it can handle your URLs as well.
To strip all non-word characters only from the end of the strings:
import re
aArray = ["palabra...", "algo,.", "si ...", "onomatopeña", "asi;", "www.google.com"]
aArray = [re.sub(r'\W+$', '', s) for s in aArray]
Result:
['palabra', 'algo', 'si', 'onomatopeña', 'asi', 'www.google.com']
Explanation:
\\W+
matches any number of non-word characters, and $
anchors the match to the end of the string.
This could be done using a list comprehension and filter
, without needing to use regex:
>>> aArray=["palabra...","algo,.", "si ...", "onomatopeña", "asi;","www.google.com"]
>>> [s.rstrip('.;, ') for s in aArray]
['palabra', 'algo', 'si', 'onomatopeña', 'asi', 'www.google.com']
Note I'm assuming '.;, '
are the all "special characters you're referring to.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.