I see a lot of similarly worded questions, but I've had a strikingly difficult time coming up with the syntax for this.
Given a list of words, I want to print all the words that do not have special characters.
I have a regex which identifies words with special characters \\w*[\À-\ǚ']\\w*
. I've seen a lot of answers with fairly straightforward scenarios like a simple word . However, I haven't been able to find anything that negates a group - I've seen several different sets of syntax to include the negative lookahead ?!
, but I haven't been able to come up with a syntax that works with it.
In my case given a string like: "should print nŌt thìs"
should print should
and print
but not the other two words. re.findall("(\\w*[\À-\ǚ']\\w*)", paragraph.text)
gives you the special characters - I just want to invert that.
For this particular case, you can simply specify the regular alphabet range in your search:
a = "should print nŌt thìs"
re.findall(r"(\b[A-Za-z]+\b)", a)
# ['should', 'print']
Of course you can add digits or anything else you want to match as well.
As for negative lookaheads, they use the syntax (?!...)
, with ?
before !
, and they must be in parentheses. To use one here, you can use:
r"\b(?!\w*[À-ǚ])\w*"
This:
\\b
, like a space or the start of the input string.\\w*
because (?![À-ǚ])
would only check for the special character being the first letter in the word. Demo . Note in regex101.com you must specify Python flavor for \\b
to work properly with special characters.
There is a third option as well:
r"\b[^À-ǚ\s]*\b"
The middle part [^À-ǚ\\s]*
means match any character other than special characters or whitespace an unlimited number of times.
I know this is not a regex, but just a completely different idea you may not have had besides using regexes. I suppose it would be also much slower but I think it works:
>>> import unicodedata as ud
>>> [word for word in ['Cá', 'Lá', 'Aqui']\
if any(['WITH' in ud.name(letter) for letter in word])]
['Cá', 'Lá']
Or use ... 'WITH' not in
to reverse.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.