[英]python regex to calculate vowel/consonant ratio in English
I've embarked on a reasonably dumb linguistics project to learn regular expressions in Python.我已经开始了一个相当愚蠢的语言学项目来学习 Python 中的正则表达式。 I'm pretty sure I could avoid the multiple passes over the same string, and find a more "compact" and "pythonic" way to do what I'm trying to do, which is: calculate using regex whether 'Y|y' in a word is a vowel or a consonant.我很确定我可以避免多次通过同一个字符串,并找到一种更“紧凑”和“pythonic”的方式来做我想做的事情,即:使用正则表达式计算是否'Y|y'总之是元音或辅音。 At the bottom of the code segment, I've put in a comment block 20 words containing 12 vowel y's and 9 consonant y's.在代码段的底部,我放入了一个包含 12 个元音 y 和 9 个辅音 y 的 20 个单词的注释块。 Seems like the code could be simplified and the re.compile lines merged together.似乎可以简化代码并将 re.compile 行合并在一起。
import re
vowelRegex = re.compile(r'[aeiouAEIOU]')
consoRegex = re.compile(r'[b-df-hj-np-tv-xzB-DF-HJ-NP-TV-XZ]')
yconsRegex = re.compile(r'[aeiou]y[aeiou]')
ycon2Regex = re.compile(r'\bY')
yVowlRegex = re.compile(r'[b-df-hj-np-tv-xzB-DF-HJ-NP-TV-XZ]y[b-df-hj-np-tv-xz]')
yVow2Regex = re.compile(r'y\b')
#thestring = 'Sky Family Yurt Germany Crypt Day New York Pennsylvania Myth Hungry Yolk Year Bayou Yak Silly Beyond Dynamite Mystery Yacht Yoda'
#thestring = 'Crypt Pennsylva Myth Dynamite Mystery'
#thestring='RoboCop eats baby food. Pennsylvania Baby Food in the bayou. And, New York is where I\'d Rather be!'
thestring='violent irrational intolerant allied to racism and ' \
'tribalism bigotry invested in ignorance and hostile to free '\
'inquiry contemptuous of women and coercive towards children ' \
'organized religion ought to have a great deal on its conscience ' \
'Yak yacht beyond mystery'
fun=vowelRegex.findall(thestring)
nofun=consoRegex.findall(thestring)
funny = yVowlRegex.findall(thestring)
foony = []
for f in funny:
foony.append (f[1])
fun += foony
fun += yVow2Regex.findall(thestring)
notfunny = yconsRegex.findall(thestring)
foony = []
for f in notfunny:
foony.append (f[1])
nofun += foony
nofun += ycon2Regex.findall(thestring)
print(thestring)
print('Vowels:',''.join(fun), len(''.join(fun)))
print('Consos:',''.join(nofun), len(''.join(nofun)))
'''
Sky Vowel; endswith 1
Family Vowel; endswith 2
Yurt Consonant; begswith 1
Germany Vowel; endswith 3
Crypt Vowel; sandwiched 1
Day Vowel; endswith 4
New York Consonant; begswith 2
Pennsylva Vowel; sandwiched 2
Myth Vowel; sandwiched 3
Hungry Vowel; endswith 5
Yolk Consonant; begswith 3
Year Consonant; begswith 4
Bayou Consonanwich 1
Yak Consonant; begswith 5
Silly Vowel; endswith 6
Beyond Consonanwich 2
Dynamite Vowel; sandwiched 4
Mystery Vowel; sandwiched, Vowel; endswith!
Yacht Consonant; begswith 6
Yoda Consonant; begswith 7
'''
You can use an or operator in regex, that could reduce it a bit.您可以在正则表达式中使用 or 运算符,这可以减少一点。 For example:例如:
yVowlRegex = re.compile(r'[b-df-hj-np-tv-xzB-DF-HJ-NP-TV-XZ]y[b-df-hj-np-tv-xz]|y\b')
now includes both yVowl and yVow2现在包括 yVowl 和 yVow2
@Joshua-Lewis answer led me to the following way to streamline the code above: @Joshua-Lewis 的回答让我采用了以下方法来简化上面的代码:
import re
vowelRegex = re.compile(r'[aeiouAEIOU]|[b-df-hj-np-tv-xzB-DF-HJ-NP-TV-XZ]y[b-df-hj-np-tv-xz]|y\b')
consoRegex = re.compile(r'[b-df-hj-np-tv-xzB-DF-HJ-NP-TV-XZ]|[aeiou]y[aeiou]|\bY')
vowelRescan = re.compile(r'[aeiouyAEIOUY]')
consoRescan = re.compile(r'[b-df-hj-np-tv-xyzB-DF-HJ-NP-TV-XYZ]')
thestring='any and every religion is violent irrational intolerant '\
'allied to racism and tribalism bigotry invested in ignorance and '\
'hostile to free inquiry contemptuous of women and coercive towards '\
'children organized religion ought to have a great deal on its '\
'conscience why it continues toward the 22nd century ACE is a mystery '\
'known only to New Yorkers and lovers of the bayou'
fun=vowelRegex.findall(thestring)
funn=''.join(fun)
fun = ''.join(vowelRescan.findall(funn))
nofun=consoRegex.findall(thestring)
nofunn=''.join(nofun)
nofun=''.join(consoRescan.findall(nofunn))
print(thestring)
print('Vowels:',fun, len(fun))
print('Consos:',nofun, len(nofun))
'''
Sky Vowel; endswith 1
Family Vowel; endswith 2
Yurt Consonant; begswith 1
Germany Vowel; endswith 3
Crypt Vowel; sandwiched 1
Day Vowel; endswith 4
New York Consonant; begswith 2
Pennsylva Vowel; sandwiched 2
Myth Vowel; sandwiched 3
Hungry Vowel; endswith 5
Yolk Consonant; begswith 3
Year Consonant; begswith 4
Bayou Consonanwich 1
Yak Consonant; begswith 5
Silly Vowel; endswith 6
Beyond Consonanwich 2
Dynamite Vowel; sandwiched 4
Mystery Vowel; sandwiched, Vowel; endswith!
Yacht Consonant; begswith 6
Yoda Consonant; begswith 7
'''
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.