简体   繁体   English

没有元音的单词如何匹配?

[英]How to match words with no vowel?

The world of vowel and around could be subjective, so I've these set of rules: 元音及其周围世界可能是主观的,因此我有以下规则:

  • A vowel is any of a, e, i, o, u. 元音是a,e,i,o,u中的任何一个。 Not y. 不对
  • A word is a sequence of English language letters, az, AZ. 单词是英语字母az,AZ的序列。
  • \\n , , (comma), . \\n, (逗号), . (period) or (句点)或 (space) are not part of the word. (空格)不是单词的一部分。

I have following string: 我有以下字符串:

text = """line with every word a vowel
sntshk xx yy.
Okay zz fine."""

My try: 我的尝试:

s = re.findall(r'[^aeiouAEIOU].*', text)
print(s)

Expectation: 期望:

['sntshk', 'xx', 'yy', 'zz']

Reality: 现实:

['line with every word a vowel', '\nsntshk xx yy.', '\nOkay zz fine.']

Related: Search all words with no vowels 相关: 搜索所有没有元音的单词

I would just target using the pattern \\b[^AEIOU_0-9\\W]+\\b in case insensitive mode: 在不区分大小写的模式下,我只使用\\b[^AEIOU_0-9\\W]+\\b模式定位:

text = """line with every word a vowel
sntshk xx yy.
Okay zz fine."""

re.findall(r'\b[^AEIOU_0-9\W]+\b', text, flags=re.I)
print(s)

['sntshk', 'xx', 'yy', 'zz']

The pattern [^\\W] in fact is a double negative, and means any word character. 模式[^\\W]实际上是一个双负号,表示任何单词字符。 To this negative class we blacklist off vowels, digits, and underscore, leaving only consonants. 对于此否定类,我们将元音,数字和下划线黑名单化,仅保留辅音。

Use an ordinary character set composed of alphabetical characters, excluding the vowels, with word boundaries at each end: 使用由字母字符组成的普通字符集(元音除外),两端各有一个单词边界:

(?i)\b[b-df-hj-np-tv-z]+\b

https://regex101.com/r/DqGuY1/1 https://regex101.com/r/DqGuY1/1

  • (?i) - Case-insensitive match (?i) -不区分大小写的匹配
  • \\b - Word boundary \\b字边界
  • [b-df-hj-np-tv-z]+ - Repeat one or more of: [b-df-hj-np-tv-z]+ -重复以下一项或多项:
    • characters in the range of bd , or fh , or jn , or pt , or vz bdfhjnptvz范围内的字符
  • \\b - Word boundary \\b字边界

More readably, but less elegantly, you could also use 您也可以使用更易读但不太优雅的方法

(?i)\b(?:(?![eiou])[b-z])+\b

There is a pure Python way you can do this without any imports: 您可以使用一种纯Python的方式来执行此操作,而无需任何导入:

[x.strip('.') for x in text.split() if all(y.lower() not in 'aeiou' for y in x)]

Example : 范例

text = """line with every word a vowel 
sntshk xx yy.
Okay zz fine."""

print([x.strip('.') for x in text.split() if all(y.lower() not in 'aeiou' for y in x)])
# ['sntshk', 'xx', 'yy', 'zz']
[^aeiouAEIOU]

This means match anything except aeiouAEIOU so it will match characters other than alphabets too which is not required as you want to get words only, 这意味着匹配除aeiouAEIOU之外的任何其他aeiouAEIOU因此它也将匹配除字母之外的其他字符,这不是必需的,因为您只想获取单词,

so simply match all the alphabets other than vowels 因此只需匹配元音以外的所有字母

\b[bcdfghjklmnpqrstvwxyz]+\b

Regex Demo

This works: 这有效:

text = """line with every word a vowel
sntshk xx yy.
Okay zz fine."""
q = ''
s = text.split()
for i in range(len(s)):
    c = 0
    s[i] = s[i].strip('.')
    for c in range(len(s[i])):
        if (s[i])[c].lower() in 'aeiou':
            q += s[i]+' '
            break
print(q)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM