简体   繁体   English

评估 Python 中单词串的辅音/元音组成

[英]Evaluate consonant/vowel composition of word string in Python

I'm trying to transform a Python string from its original form to its vowel/consonant combinations.我正在尝试将 Python 字符串从其原始形式转换为其元音/辅音组合。

Eg - 'Dog' becomes 'cvc' and 'Bike' becomes 'cvcv'例如 - 'Dog' 变成 'cvc' 而 'Bike' 变成 'cvcv'

In R I was able to employ the following method:在 R 我能够采用以下方法:

   con_vowel <- gsub("[aeiouAEIOU]","V",df$col_name)
   con_vowel <- gsub("[^V]","C",con_vowel)
   df[["composition"]] <- con_vowel

This would assess whether the character is vowel and if true assign the character 'V', then assess that string and replace anything that wasn't 'V' with 'C', then place the results into a new column called 'composition' within the dataframe.这将评估字符是否为元音,如果为真,则分配字符“V”,然后评估该字符串并将不是“V”的任何内容替换为“C”,然后将结果放入名为“composition”的新列中dataframe。

In Python I have written some code in an attepmpt to replicate the functionality but it does not return the desired result.在 Python 中,我在尝试中编写了一些代码来复制功能,但它没有返回所需的结果。 Please see below.请看下文。

word = 'yoyo'


for i in word.lower():
    if i in "aeiou":
       word = i.replace(i ,'v')
    else: word = i.replace(i ,'c')
print(word)

The theory here is that each character would be evaluated and, if it isn't a vowel, then by deduction it must be a consonant.这里的理论是每个字符都会被评估,如果它不是元音,那么通过演绎它必须是辅音。 However the result I get is:但是我得到的结果是:

v

I underastand why this is happening, but I am no clearer as to how to achieve my desired result.我理解为什么会发生这种情况,但我不清楚如何达到我想要的结果。

Please note that I also need the resultant code to be applied to a dataframe column and create a new column from these results.请注意,我还需要将生成的代码应用于 dataframe 列,并根据这些结果创建一个新列。

If you could explain the workings of your answer it would help me greatly.如果您能解释答案的工作原理,那将对我有很大帮助。

Thanks in advance.提前致谢。

There's a method for this;有一种方法; it's translate .它的translate It's both efficient and defaults to pass values that are not found in your translation table (like ' ' ).传递在翻译表中找不到的值(如' ' )既有效又默认。

You can use the string library to get all of the consonants if you want.如果需要,您可以使用string库来获取所有辅音。

import pandas as pd
import string

df = pd.DataFrame(['Cat', 'DOG', 'bike', 'APPLE', 'foo bar'], columns=['words'])

vowels = 'aeiouAEIOU'
cons = ''.join(set(string.ascii_letters).difference(set(vowels)))
trans = str.maketrans(vowels+cons, 'v'*len(vowels)+'c'*len(cons))

df['translated'] = df['words'].str.translate(trans)

     words translated
0      Cat        cvc
1      DOG        cvc
2     bike       cvcv
3    APPLE      vcccv
4  foo bar    cvv cvc

It's made for exactly this, so it's fast.它正是为此而设计的,所以速度很快。

在此处输入图像描述

# Supporting code
import perfplot
import pandas as pd
import string

def with_translate(s):
    vowels = 'aeiouAEIOU'
    cons = ''.join(set(string.ascii_letters).difference(set(vowels)))
    trans = str.maketrans(vowels+cons, 'v'*len(vowels)+'c'*len(cons))

    return s.str.translate(trans)


def with_replace(s):
    return s.replace({"[^aeiouAEIOU]":'c', '[aeiouAEIOU]':'v'}, regex=True)


perfplot.show(
    setup=lambda n: pd.Series(np.random.choice(['foo', 'bAR', 'foobar', 'APPLE', 'ThisIsABigWord'], n)), 
    kernels=[
        lambda s: with_translate(s),
        lambda s: with_replace(s),
    ],
    labels=['Translate', 'Replace'],
    n_range=[2 ** k for k in range(19)],
    equality_check=None,  
    xlabel='len(s)'
)

You can use replace with regex=True :您可以使用replace regex=True

words = pd.Series(['This', 'is', 'an', 'Example'])
words.str.lower().replace({"[^aeiou]":'c', '[aeiou]':'v'}, regex=True)

Output: Output:

0       ccvc
1         vc
2         vc
3    vcvcccv
dtype: object

use string.replace with some regex to avoid the loop使用 string.replace 和一些正则表达式来避免循环

df = pd.DataFrame(['Cat', 'DOG', 'bike'], columns=['words'])
# use string.replace
df['new_word'] = df['words'].str.lower().str.replace(r"[^aeiuo]", 'c').str.replace(r"[aeiou]", 'v')
print(df)

  words new_word
0   Cat      cvc
1   DOG      cvc
2  bike     cvcv

In Python strings are immutable.在 Python 中,字符串是不可变的。
Why?为什么?

There are several advantages.有几个优点。

One is performance: knowing that a string is immutable means we can allocate space for it at creation time, and the storage requirements are fixed and unchanging.一是性能:知道字符串是不可变的意味着我们可以在创建时为其分配空间,并且存储需求是固定不变的。 This is also one of the reasons for the distinction between tuples and lists.这也是区分元组和列表的原因之一。

Another advantage is that strings in Python are considered as “elemental” as numbers.另一个优点是 Python 中的字符串被视为数字的“基本”。 No amount of activity will change the value 8 to anything else, and in Python, no amount of activity will change the string “eight” to anything else.任何活动都不会将值 8 更改为其他任何值,并且在 Python 中,任何活动都不会将字符串“八”更改为其他任何值。

In order to reduce confusion and potential errors it is preferable to create a new string instead of changing the original.为了减少混淆和潜在错误,最好创建一个新字符串而不是更改原始字符串。 I have also added the is_alpha() in order to be able to understand if we are dealing with an alphabet letter or a number / symbol and act accordingly.我还添加了 is_alpha() 以便能够了解我们是在处理字母还是数字/符号并采取相应的行动。

Here's my code:这是我的代码:

word = 'yoyo'

def vocals_consonants_transformation(word):
    modified_word = ""
    for i in range(0, len(word)):
        if word[i].isalpha():
            if word[i] in "aeiou":
                modified_word += 'v'
            else:
                modified_word += 'c'
        else:
            modified_word += word[i]
    return modified_word


print(vocals_consonants_transformation(word))

Output Output
cvcv简历

Source:资源:
https://docs.python.org/3/faq/design.html#why-are-python-strings-immutable https://docs.python.org/3/faq/design.html#why-are-python-strings-immutable

Try this:尝试这个:

word = 'yoyo'
word = list(word)

for i in range(len(word)):
    if word[i] in 'aeiou':
        word[i] = 'v'
    else:
        word[i] = 'c'

print(''.join(word))

Try it like this:试试这样:

word = 'yoyo'

for i in word.lower():
    if i in "aeiou":
       word=word.replace(i ,'v')
    else:
        word=word.replace(i ,'c')
print(word)
vowels = set("aeiou")
word = "Dog"

new_word = ""
for char in word.lower():
    new_word += "v" if char in vowels else "c"

print(new_word)

Note that this uses set for vowels for faster membership test.请注意,这使用set for vowels 进行更快的成员资格测试。 Other than that, we traverse the lowered verison of the word and add the desired character ( v or c ) to newly formed string via a ternary.除此之外,我们遍历word的降低版本并通过三元组将所需的字符( vc )添加到新形成的字符串中。

You probably already realized this, but in your solution the for loop determines for each letter whether it is a vowel or not but does not save the result.您可能已经意识到这一点,但在您的解决方案中,for 循环确定每个字母是否为元音,但不保存结果。 This is why it only gives you the result of the last iteration (v, since 'o' is a vowel).这就是为什么它只给你最后一次迭代的结果(v,因为'o'是一个元音)。

You can try creating a new, empty string and then add to it:您可以尝试创建一个新的空字符串,然后添加到其中:

word='yoyo'
new_word=''

for i in word.lower():
    if i in "aeiou":
        new_word+='v'
    else:
        new_word+='c'

print(new_word)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM