简体   繁体   中英

python regex to calculate vowel/consonant ratio in English

I've embarked on a reasonably dumb linguistics project to learn regular expressions in Python. I'm pretty sure I could avoid the multiple passes over the same string, and find a more "compact" and "pythonic" way to do what I'm trying to do, which is: calculate using regex whether 'Y|y' in a word is a vowel or a consonant. At the bottom of the code segment, I've put in a comment block 20 words containing 12 vowel y's and 9 consonant y's. Seems like the code could be simplified and the re.compile lines merged together.

import re
vowelRegex = re.compile(r'[aeiouAEIOU]')
consoRegex = re.compile(r'[b-df-hj-np-tv-xzB-DF-HJ-NP-TV-XZ]')
yconsRegex = re.compile(r'[aeiou]y[aeiou]') 
ycon2Regex = re.compile(r'\bY')
yVowlRegex = re.compile(r'[b-df-hj-np-tv-xzB-DF-HJ-NP-TV-XZ]y[b-df-hj-np-tv-xz]') 
yVow2Regex = re.compile(r'y\b')

#thestring = 'Sky Family Yurt Germany Crypt Day New York Pennsylvania Myth Hungry Yolk Year Bayou Yak Silly Beyond Dynamite Mystery Yacht Yoda'
#thestring = 'Crypt Pennsylva Myth Dynamite Mystery'
#thestring='RoboCop eats baby food. Pennsylvania Baby Food in the bayou. And, New York is where I\'d Rather be!'
thestring='violent irrational intolerant allied to racism and ' \
    'tribalism bigotry invested in ignorance and hostile to free '\
    'inquiry contemptuous of women and coercive towards children ' \
    'organized religion ought to have a great deal on its conscience ' \
    'Yak yacht beyond mystery'
fun=vowelRegex.findall(thestring)
nofun=consoRegex.findall(thestring)
funny = yVowlRegex.findall(thestring) 
foony = []
for f in funny:
    foony.append (f[1])
fun += foony   
fun += yVow2Regex.findall(thestring)
notfunny = yconsRegex.findall(thestring)

foony = []
for f in notfunny:
    foony.append (f[1])
nofun += foony
nofun += ycon2Regex.findall(thestring)

print(thestring)
print('Vowels:',''.join(fun), len(''.join(fun)))
print('Consos:',''.join(nofun), len(''.join(nofun)))


'''
Sky         Vowel; endswith 1
Family      Vowel; endswith 2 
Yurt        Consonant; begswith 1
Germany     Vowel; endswith 3
Crypt       Vowel; sandwiched 1
Day         Vowel; endswith 4
New York    Consonant; begswith 2
Pennsylva   Vowel; sandwiched 2
Myth        Vowel; sandwiched 3
Hungry      Vowel; endswith 5
Yolk        Consonant; begswith 3
Year        Consonant; begswith 4
Bayou       Consonanwich 1
Yak         Consonant; begswith 5
Silly       Vowel; endswith 6
Beyond      Consonanwich 2
Dynamite    Vowel; sandwiched 4
Mystery     Vowel; sandwiched, Vowel; endswith!
Yacht       Consonant; begswith 6
Yoda        Consonant; begswith 7
'''

You can use an or operator in regex, that could reduce it a bit. For example:

yVowlRegex = re.compile(r'[b-df-hj-np-tv-xzB-DF-HJ-NP-TV-XZ]y[b-df-hj-np-tv-xz]|y\b') 

now includes both yVowl and yVow2

@Joshua-Lewis answer led me to the following way to streamline the code above:

import re
vowelRegex = re.compile(r'[aeiouAEIOU]|[b-df-hj-np-tv-xzB-DF-HJ-NP-TV-XZ]y[b-df-hj-np-tv-xz]|y\b')
consoRegex = re.compile(r'[b-df-hj-np-tv-xzB-DF-HJ-NP-TV-XZ]|[aeiou]y[aeiou]|\bY')
vowelRescan = re.compile(r'[aeiouyAEIOUY]')
consoRescan = re.compile(r'[b-df-hj-np-tv-xyzB-DF-HJ-NP-TV-XYZ]')
thestring='any and every religion is violent irrational intolerant '\
    'allied to racism and tribalism bigotry invested in ignorance and '\
    'hostile to free inquiry contemptuous of women and coercive towards '\
    'children organized religion ought to have a great deal on its '\
    'conscience why it continues toward the 22nd century ACE is a mystery '\
    'known only to New Yorkers and lovers of the bayou'
fun=vowelRegex.findall(thestring)
funn=''.join(fun)
fun = ''.join(vowelRescan.findall(funn))
nofun=consoRegex.findall(thestring)
nofunn=''.join(nofun)
nofun=''.join(consoRescan.findall(nofunn))

print(thestring)
print('Vowels:',fun, len(fun))
print('Consos:',nofun, len(nofun))



'''
Sky         Vowel; endswith 1
Family      Vowel; endswith 2 
Yurt        Consonant; begswith 1
Germany     Vowel; endswith 3
Crypt       Vowel; sandwiched 1
Day         Vowel; endswith 4
New York    Consonant; begswith 2
Pennsylva   Vowel; sandwiched 2
Myth        Vowel; sandwiched 3
Hungry      Vowel; endswith 5
Yolk        Consonant; begswith 3
Year        Consonant; begswith 4
Bayou       Consonanwich 1
Yak         Consonant; begswith 5
Silly       Vowel; endswith 6
Beyond      Consonanwich 2
Dynamite    Vowel; sandwiched 4
Mystery     Vowel; sandwiched, Vowel; endswith!
Yacht       Consonant; begswith 6
Yoda        Consonant; begswith 7
'''

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM