简体   繁体   English

python中字符串中的元音组

[英]Vowel groups from string in python

I'm trying to extract all the vowel groups from a string and get the index of each vowel group.我正在尝试从字符串中提取所有元音组并获取每个元音组的索引。 For ex in the word = 'britain' the vowel groups are 'i' and 'ai' and there indexes in the string are 2 and 4. I would like to create two lists that keep track of the vowels groups and the indexes in the string.对于单词='britain'中的ex,元音组是'i'和'ai',字符串中的索引是2和4。我想创建两个列表来跟踪元音组和索引细绳。 Maybe there is a way to do this with regex or itertools groupby也许有一种方法可以使用 regex 或 itertools groupby

This is my code so far:到目前为止,这是我的代码:

first='phoebe'
vowels=['a','e','i','o','u']
char=""
lst=[]
for i in range(len(first)-1):
    if first[i] in vowels:
        char+=first[i]
    if first[i] not in vowels:
        lst.append(char)
        char=""

You can do this with a regex:您可以使用正则表达式执行此操作:

import re

s = 'fountain of youth'

indices = []
strings = []

for m in re.finditer(r'[aeiou]+', s):
    indices.append(m.start())
    strings.append(m.group())
    
indices, strings
# ([1, 5, 9, 13], ['ou', 'ai', 'o', 'ou'])

It wouldn't be hard to do this as a zipped iterator, but you need to be careful if the string may be without vowels作为压缩迭代器执行此操作并不难,但如果字符串可能没有元音,则需要小心

first = 'phoebe'
vowels = ['a','e','i','o','u']
vowelGroup = ""
vowelGroups = []
indices = []
index = -1
for i in range(len(first)): #Don't do -1 here otherwise you would miss last 'e' from 'phoebe'
    if first[i] in vowels:
        vowelGroup += first[i]
        if index == -1:
            index = i
    elif index != -1:
        vowelGroups.append(vowelGroup)
        indices.append(index)
        vowelGroup = ""
        index = -1
if index != -1:
    vowelGroups.append(vowelGroup)
    indices.append(index)
print(vowelGroups, indices)

You could do this with itertools.groupby , grouping on whether the letter is a vowel, and then extracting the indexes and strings from the groupby object:您可以使用itertools.groupby执行此操作,根据字母是否为元音进行分组,然后从 groupby 对象中提取索引和字符串:

import itertools

first='phoebe'
vowels=['a','e','i','o','u']
vw = itertools.groupby(enumerate(first), key=lambda t:t[1] in vowels)
vgrps = [list(g) for k, g in vw if k]
indices = [g[0][0] for g in vgrps]
print(indices)
strings = [''.join(t[1] for t in g) for g in vgrps]
print(strings)

Output:输出:

[2, 5]
['oe', 'e']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM