简体   繁体   English

在一个单词中找到连续的辅音

[英]Finding consecutive consonants in a word

I need code that will show me the consecutive consonants in a word. 我需要代码来显示单词中的连续辅音。 For example, for "concertation" I need to obtain ["c","nc","rt","t","n"] . 例如,对于"concertation"我需要获取["c","nc","rt","t","n"]

Here is my code: 这是我的代码:

def SuiteConsonnes(mot):
    consonnes=[]
    for x in mot:
        if x in "bcdfghjklmnprstvyz":
           consonnes += x + ''
    return consonnes

I manage to find the consonants, but I don't see how to find them consecutively. 我设法找到辅音,但看不到如何连续找到辅音。 Can anybody tell me what I need to do? 有人可以告诉我我需要做什么吗?

You can use regular expressions, implemented in the re module 您可以使用在re模块中实现的正则表达式

Better solution 更好的解决方案

>>> re.findall(r'[bcdfghjklmnpqrstvwxyz]+', "concertation", re.IGNORECASE)
['c', 'nc', 'rt', 't', 'n']
  • [bcdfghjklmnprstvyz]+ matches any sequence of one or more characters from the character class [bcdfghjklmnprstvyz]+匹配字符类中一个或多个字符的任何序列

  • re.IGNORECASE enables a case in sensitive match on the characters. re.IGNORECASE启用区分大小写的字符。 That is 那是

     >>> re.findall(r'[bcdfghjklmnpqrstvwxyz]+', "CONCERTATION", re.IGNORECASE) ['C', 'NC', 'RT', 'T', 'N'] 

Another Solution 另一种解决方案

>>> import re
>>> re.findall(r'[^aeiou]+', "concertation",)
['c', 'nc', 'rt', 't', 'n']
  • [^aeiou] Negated character class. [^aeiou]否定的角色类。 Matches anything character other than the one in this character class. 匹配除此字符类中的字符以外的任何字符。 That is in short Matches consonents in the string 简而言之,匹配字符串中的辅音

  • + quantifer + matches one or more occurence of the pattern in the string +量词+匹配字符串中一个或多个模式的出现

Note This will also find the non alphabetic, adjacent characters in the solution. 注意这还将在解决方案中找到非字母的相邻字符。 As the character class is anything other than vowels 由于字符类比元音以外的任何

Example

>>> re.findall(r'[^aeiou]+', "123concertation",)
['123c', 'nc', 'rt', 't', 'n']

If you are sure that the input always contain alphabets, this solution is ok 如果您确定输入中始终包含字母,则可以使用此解决方案


 re.findall(pattern, string, flags=0)

    Return all non-overlapping matches of pattern in string, as a list of strings. 
    The string is scanned left-to-right, and matches are returned in the order found. 

If you are curious about how the result is obtained for 如果您对如何获得结果感到好奇

re.findall(r'[bcdfghjklmnpqrstvwxyz]+', "concertation")

concertation
|
c

concertation
 |
 # o is not present in the character class. Matching ends here. Adds match, 'c' to ouput list


concertation
  |
  n

concertation
   |
   c


concertation
    |
     # Match ends again. Adds match 'nc' to list 
     # And so on

You could do this with regular expressions and the re module's split function: 您可以使用正则表达式和re模块的split函数来做到这一点:

>>> import re
>>> re.split(r"[aeiou]+", "concertation", flags=re.I)
['c', 'nc', 'rt', 't', 'n']

This method splits the string whenever one or more consecutive vowels are matched. 每当匹配一个或多个连续的元音时,此方法就会拆分字符串。

To explain the regular expression "[aeiou]+" : here the vowels have been collected into a class [aeiou] while the + indicates that one or more occurrence of any character in this class can be matched. 为了解释正则表达式"[aeiou]+" :在这里,元音已被收集到一个类[aeiou]+表示该类中任何字符的一个或多个出现都可以匹配。 Hence the string "concertation" is split at o , e , a and io . 因此,字符串"concertation"oeaio处拆分。

The re.I flag means that the case of the letters will be ignored, effectively making the character class equal to [aAeEiIoOuU] . re.I标志意味着字母的大小写将被忽略,有效地使字符类等于[aAeEiIoOuU]

Edit : One thing to keep in mind is that this method implicitly assumes that the word contains only vowels and consonants. 编辑 :要记住的一件事是,此方法隐式假定单词仅包含元音和辅音。 Numbers and punctuation will be treated as non-vowels/consonants. 数字和标点符号将被视为非元音/辅音。 To match only consecutive consonants, instead use re.findall with the consonants listed in the character class (as noted in other answers). 匹配连续的辅音,请使用re.findall和字符类中列出的辅音(如其他答案中所述)。

One useful shortcut to typing out all the consonants is to use the third-party regex module instead of re . 键入所有辅音的一个有用捷径是使用第三方regex模块而不是re

This module supports set operations, so the character class containing the consonants can be neatly written as the entire alphabet minus the vowels: 该模块支持设置操作,因此包含辅音的字符类可以整齐地写为整个字母减去元音:

[[a-z]--[aeiou]] # equal to [bcdefghjklmnpqrstvwxyz]

Where [az] is the entire alphabet, -- is set difference and [aeiou] are the vowels. 其中[az]是整个字母, --设置差, [aeiou]是元音。

If you are up for a non-regex solution, itertools.groupby would work perfectly fine here, like this 如果您打算使用非正则表达式解决方案,则itertools.groupby在这里可以很好地工作,就像这样

>>> from itertools import groupby
>>> is_vowel = lambda char: char in "aAeEiIoOuU"
>>> def suiteConsonnes(in_str):
...     return ["".join(g) for v, g in groupby(in_str, key=is_vowel) if not v]
... 
>>> suiteConsonnes("concertation")
['c', 'nc', 'rt', 't', 'n']

A really, really simple solution without importing anything is to replace the vowels with a single thing, then split on that thing: 一个非常简单的解决方案,无需导入任何内容,就是用一个东西替换元音,然后拆分该东西:

def SuiteConsonnes(mot):
    consonnes = ''.join([l if l not in "aeiou" else "0" for l in mot])
    return [c for c in consonnes.split("0") if c is not '']

To keep it really similar to your code - and to add generators - we get this: 为了使其与您的代码非常相似-并添加生成器-我们得到以下信息:

def SuiteConsonnes(mot):
    consonnes=[]
    for x in mot:
        if x in "bcdfghjklmnprstvyz":
            consonnes.append(x)
        elif consonnes:
            yield ''.join(consonnes)
            consonnes = []
    if consonnes: yield ''.join(consonnes)
def SuiteConsonnes(mot):
    consonnes=[]
    consecutive = '' # initialize consecutive string of consonants
    for x in mot:
        if x in "aeiou":   # checks if x is not a consonant
           if consecutive:  # checks if consecutive string is not empty
              consonnes.append(consecutive)  # append consecutive string to consonnes
              consecutive = ''  # reinitialize consecutive for another consecutive string of consonants
        else:
           consecutive += x   # add x to consecutive string if x is a consonant or not a vowel
    if consecutive: # checks if consecutive string is not empty
        consonnes.append(consecutive)  # append last consecutive string of consonants  
    return consonnes

SuiteConsonnes('concertation')
#['c', 'nc', 'rt', 't', 'n']

Not that I'd recommend it for readability, but a one-line solution is: 并不是为了可读性而推荐它,但是单行解决方案是:

In [250]: q = "concertation"
In [251]: [s for s in ''.join([l if l not in 'aeiou' else ' ' for l in q]).split()]
Out[251]: ['c', 'nc', 'rt', 't', 'n']

That is: join the non-vowels with spaces and split again on whitespace. 也就是说:将非元音与空格连接在一起,并在空白处再次拆分。

Use regular expressions from re built-in module: 从使用正则表达式re内置模块:

import re

def find_consonants(string):
    # find all non-vovels occuring 1 or more times: 
    return re.findall(r'[^aeiou]+', string)

虽然我认为您应该使用@ nu11p01n73R的答案,但这也可以工作:

re.sub('[AaEeIiOoUu]+',' ','concertation').split()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM