简体   繁体   中英

Evaluate consonant/vowel composition of word string in Python

I'm trying to transform a Python string from its original form to its vowel/consonant combinations.

Eg - 'Dog' becomes 'cvc' and 'Bike' becomes 'cvcv'

In R I was able to employ the following method:

   con_vowel <- gsub("[aeiouAEIOU]","V",df$col_name)
   con_vowel <- gsub("[^V]","C",con_vowel)
   df[["composition"]] <- con_vowel

This would assess whether the character is vowel and if true assign the character 'V', then assess that string and replace anything that wasn't 'V' with 'C', then place the results into a new column called 'composition' within the dataframe.

In Python I have written some code in an attepmpt to replicate the functionality but it does not return the desired result. Please see below.

word = 'yoyo'


for i in word.lower():
    if i in "aeiou":
       word = i.replace(i ,'v')
    else: word = i.replace(i ,'c')
print(word)

The theory here is that each character would be evaluated and, if it isn't a vowel, then by deduction it must be a consonant. However the result I get is:

v

I underastand why this is happening, but I am no clearer as to how to achieve my desired result.

Please note that I also need the resultant code to be applied to a dataframe column and create a new column from these results.

If you could explain the workings of your answer it would help me greatly.

Thanks in advance.

There's a method for this; it's translate . It's both efficient and defaults to pass values that are not found in your translation table (like ' ' ).

You can use the string library to get all of the consonants if you want.

import pandas as pd
import string

df = pd.DataFrame(['Cat', 'DOG', 'bike', 'APPLE', 'foo bar'], columns=['words'])

vowels = 'aeiouAEIOU'
cons = ''.join(set(string.ascii_letters).difference(set(vowels)))
trans = str.maketrans(vowels+cons, 'v'*len(vowels)+'c'*len(cons))

df['translated'] = df['words'].str.translate(trans)

     words translated
0      Cat        cvc
1      DOG        cvc
2     bike       cvcv
3    APPLE      vcccv
4  foo bar    cvv cvc

It's made for exactly this, so it's fast.

在此处输入图像描述

# Supporting code
import perfplot
import pandas as pd
import string

def with_translate(s):
    vowels = 'aeiouAEIOU'
    cons = ''.join(set(string.ascii_letters).difference(set(vowels)))
    trans = str.maketrans(vowels+cons, 'v'*len(vowels)+'c'*len(cons))

    return s.str.translate(trans)


def with_replace(s):
    return s.replace({"[^aeiouAEIOU]":'c', '[aeiouAEIOU]':'v'}, regex=True)


perfplot.show(
    setup=lambda n: pd.Series(np.random.choice(['foo', 'bAR', 'foobar', 'APPLE', 'ThisIsABigWord'], n)), 
    kernels=[
        lambda s: with_translate(s),
        lambda s: with_replace(s),
    ],
    labels=['Translate', 'Replace'],
    n_range=[2 ** k for k in range(19)],
    equality_check=None,  
    xlabel='len(s)'
)

You can use replace with regex=True :

words = pd.Series(['This', 'is', 'an', 'Example'])
words.str.lower().replace({"[^aeiou]":'c', '[aeiou]':'v'}, regex=True)

Output:

0       ccvc
1         vc
2         vc
3    vcvcccv
dtype: object

use string.replace with some regex to avoid the loop

df = pd.DataFrame(['Cat', 'DOG', 'bike'], columns=['words'])
# use string.replace
df['new_word'] = df['words'].str.lower().str.replace(r"[^aeiuo]", 'c').str.replace(r"[aeiou]", 'v')
print(df)

  words new_word
0   Cat      cvc
1   DOG      cvc
2  bike     cvcv

In Python strings are immutable.
Why?

There are several advantages.

One is performance: knowing that a string is immutable means we can allocate space for it at creation time, and the storage requirements are fixed and unchanging. This is also one of the reasons for the distinction between tuples and lists.

Another advantage is that strings in Python are considered as “elemental” as numbers. No amount of activity will change the value 8 to anything else, and in Python, no amount of activity will change the string “eight” to anything else.

In order to reduce confusion and potential errors it is preferable to create a new string instead of changing the original. I have also added the is_alpha() in order to be able to understand if we are dealing with an alphabet letter or a number / symbol and act accordingly.

Here's my code:

word = 'yoyo'

def vocals_consonants_transformation(word):
    modified_word = ""
    for i in range(0, len(word)):
        if word[i].isalpha():
            if word[i] in "aeiou":
                modified_word += 'v'
            else:
                modified_word += 'c'
        else:
            modified_word += word[i]
    return modified_word


print(vocals_consonants_transformation(word))

Output
cvcv

Source:
https://docs.python.org/3/faq/design.html#why-are-python-strings-immutable

Try this:

word = 'yoyo'
word = list(word)

for i in range(len(word)):
    if word[i] in 'aeiou':
        word[i] = 'v'
    else:
        word[i] = 'c'

print(''.join(word))

Try it like this:

word = 'yoyo'

for i in word.lower():
    if i in "aeiou":
       word=word.replace(i ,'v')
    else:
        word=word.replace(i ,'c')
print(word)
vowels = set("aeiou")
word = "Dog"

new_word = ""
for char in word.lower():
    new_word += "v" if char in vowels else "c"

print(new_word)

Note that this uses set for vowels for faster membership test. Other than that, we traverse the lowered verison of the word and add the desired character ( v or c ) to newly formed string via a ternary.

You probably already realized this, but in your solution the for loop determines for each letter whether it is a vowel or not but does not save the result. This is why it only gives you the result of the last iteration (v, since 'o' is a vowel).

You can try creating a new, empty string and then add to it:

word='yoyo'
new_word=''

for i in word.lower():
    if i in "aeiou":
        new_word+='v'
    else:
        new_word+='c'

print(new_word)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM