簡體   English   中英

python 正則表達式計算英語元音/輔音比率

[英]python regex to calculate vowel/consonant ratio in English

我已經開始了一個相當愚蠢的語言學項目來學習 Python 中的正則表達式。 我很確定我可以避免多次通過同一個字符串,並找到一種更“緊湊”和“pythonic”的方式來做我想做的事情,即:使用正則表達式計算是否'Y|y'總之是元音或輔音。 在代碼段的底部,我放入了一個包含 12 個元音 y 和 9 個輔音 y 的 20 個單詞的注釋塊。 似乎可以簡化代碼並將 re.compile 行合並在一起。

import re
vowelRegex = re.compile(r'[aeiouAEIOU]')
consoRegex = re.compile(r'[b-df-hj-np-tv-xzB-DF-HJ-NP-TV-XZ]')
yconsRegex = re.compile(r'[aeiou]y[aeiou]') 
ycon2Regex = re.compile(r'\bY')
yVowlRegex = re.compile(r'[b-df-hj-np-tv-xzB-DF-HJ-NP-TV-XZ]y[b-df-hj-np-tv-xz]') 
yVow2Regex = re.compile(r'y\b')

#thestring = 'Sky Family Yurt Germany Crypt Day New York Pennsylvania Myth Hungry Yolk Year Bayou Yak Silly Beyond Dynamite Mystery Yacht Yoda'
#thestring = 'Crypt Pennsylva Myth Dynamite Mystery'
#thestring='RoboCop eats baby food. Pennsylvania Baby Food in the bayou. And, New York is where I\'d Rather be!'
thestring='violent irrational intolerant allied to racism and ' \
    'tribalism bigotry invested in ignorance and hostile to free '\
    'inquiry contemptuous of women and coercive towards children ' \
    'organized religion ought to have a great deal on its conscience ' \
    'Yak yacht beyond mystery'
fun=vowelRegex.findall(thestring)
nofun=consoRegex.findall(thestring)
funny = yVowlRegex.findall(thestring) 
foony = []
for f in funny:
    foony.append (f[1])
fun += foony   
fun += yVow2Regex.findall(thestring)
notfunny = yconsRegex.findall(thestring)

foony = []
for f in notfunny:
    foony.append (f[1])
nofun += foony
nofun += ycon2Regex.findall(thestring)

print(thestring)
print('Vowels:',''.join(fun), len(''.join(fun)))
print('Consos:',''.join(nofun), len(''.join(nofun)))


'''
Sky         Vowel; endswith 1
Family      Vowel; endswith 2 
Yurt        Consonant; begswith 1
Germany     Vowel; endswith 3
Crypt       Vowel; sandwiched 1
Day         Vowel; endswith 4
New York    Consonant; begswith 2
Pennsylva   Vowel; sandwiched 2
Myth        Vowel; sandwiched 3
Hungry      Vowel; endswith 5
Yolk        Consonant; begswith 3
Year        Consonant; begswith 4
Bayou       Consonanwich 1
Yak         Consonant; begswith 5
Silly       Vowel; endswith 6
Beyond      Consonanwich 2
Dynamite    Vowel; sandwiched 4
Mystery     Vowel; sandwiched, Vowel; endswith!
Yacht       Consonant; begswith 6
Yoda        Consonant; begswith 7
'''

您可以在正則表達式中使用 or 運算符,這可以減少一點。 例如:

yVowlRegex = re.compile(r'[b-df-hj-np-tv-xzB-DF-HJ-NP-TV-XZ]y[b-df-hj-np-tv-xz]|y\b') 

現在包括 yVowl 和 yVow2

@Joshua-Lewis 的回答讓我采用了以下方法來簡化上面的代碼:

import re
vowelRegex = re.compile(r'[aeiouAEIOU]|[b-df-hj-np-tv-xzB-DF-HJ-NP-TV-XZ]y[b-df-hj-np-tv-xz]|y\b')
consoRegex = re.compile(r'[b-df-hj-np-tv-xzB-DF-HJ-NP-TV-XZ]|[aeiou]y[aeiou]|\bY')
vowelRescan = re.compile(r'[aeiouyAEIOUY]')
consoRescan = re.compile(r'[b-df-hj-np-tv-xyzB-DF-HJ-NP-TV-XYZ]')
thestring='any and every religion is violent irrational intolerant '\
    'allied to racism and tribalism bigotry invested in ignorance and '\
    'hostile to free inquiry contemptuous of women and coercive towards '\
    'children organized religion ought to have a great deal on its '\
    'conscience why it continues toward the 22nd century ACE is a mystery '\
    'known only to New Yorkers and lovers of the bayou'
fun=vowelRegex.findall(thestring)
funn=''.join(fun)
fun = ''.join(vowelRescan.findall(funn))
nofun=consoRegex.findall(thestring)
nofunn=''.join(nofun)
nofun=''.join(consoRescan.findall(nofunn))

print(thestring)
print('Vowels:',fun, len(fun))
print('Consos:',nofun, len(nofun))



'''
Sky         Vowel; endswith 1
Family      Vowel; endswith 2 
Yurt        Consonant; begswith 1
Germany     Vowel; endswith 3
Crypt       Vowel; sandwiched 1
Day         Vowel; endswith 4
New York    Consonant; begswith 2
Pennsylva   Vowel; sandwiched 2
Myth        Vowel; sandwiched 3
Hungry      Vowel; endswith 5
Yolk        Consonant; begswith 3
Year        Consonant; begswith 4
Bayou       Consonanwich 1
Yak         Consonant; begswith 5
Silly       Vowel; endswith 6
Beyond      Consonanwich 2
Dynamite    Vowel; sandwiched 4
Mystery     Vowel; sandwiched, Vowel; endswith!
Yacht       Consonant; begswith 6
Yoda        Consonant; begswith 7
'''

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM