繁体   English   中英

计算拆分字符串中的辅音和元音

[英]Counting consonants and vowels in a split string

我读了一个 .csv 文件。 我有以下数据框,用于计算Description列中字符串中的元音和辅音。 这很好用,但我的问题是我想将Description分成 8 列并计算每列的辅音和元音。 我的代码的第二部分允许我将Description分成 8 列。 我如何计算Description分成的所有 8 列上的元音和辅音?

import pandas as pd
import re

def anti_vowel(s):
    result = re.sub(r'[AEIOU]', '', s, flags=re.IGNORECASE)
    return result

data = pd.read_csv('http://core.secure.ehc.com/src/util/detail-price-list/TristarDivision_SummitMedicalCenter_CM.csv')

data.dropna(inplace = True)

data['Vowels'] = data['Description'].str.count(r'[aeiou]', flags=re.I)
data['Consonant'] = data['Description'].str.count(r'[bcdfghjklmnpqrstvwxzy]', flags=re.I)

print (data)

这是我用来将列Description拆分为 8 列的代码。

import pandas as pd
data = data["Description"].str.split(" ", n = 8, expand = True)
data = pd.read_csv('http://core.secure.ehc.com/src/util/detail-price-list/TristarDivision_SummitMedicalCenter_CM.csv')

data.dropna(inplace = True)

data = data["Description"].str.split(" ", n = 8, expand = True)

print (data)

现在我怎样才能把它们放在一起?

为了读取 8 的每一列并计算辅音,我知道我可以使用以下将 0 替换为 0-7:

testconsonant = data[0].str.count(r'[bcdfghjklmnpqrstvwxzy]', flags=re.I)
testvowel = data[0].str.count(r'[aeiou]', flags=re.I)

期望的输出是:

Description [0] vowel count consonant count Description [1] vowel count consonant count Description [2] vowel count consonant count Description [3] vowel count consonant count Description [4] vowel count consonant count all the way to description [7]

stack然后unstack

stacked = data.stack()
pd.concat({
    'Vowels': stacked.str.count('[aeiou]', flags=re.I),
    'Consonant': stacked.str.count('[bcdfghjklmnpqrstvwxzy]', flags=re.I)
}, axis=1).unstack()

      Consonant                                         Vowels                                        
              0    1    2    3    4    5    6    7    8      0    1    2    3    4    5    6    7    8
0           3.0  5.0  5.0  1.0  2.0  NaN  NaN  NaN  NaN    1.0  0.0  0.0  0.0  0.0  NaN  NaN  NaN  NaN
1           8.0  5.0  1.0  0.0  0.0  0.0  0.0  0.0  NaN    0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  NaN
2           8.0  5.0  1.0  0.0  0.0  0.0  0.0  0.0  NaN    0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  NaN
3           8.0  5.0  1.0  0.0  0.0  0.0  0.0  0.0  NaN    0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  NaN
4           3.0  5.0  3.0  1.0  0.0  0.0  0.0  0.0  NaN    0.0  0.0  2.0  0.0  0.0  0.0  0.0  0.0  NaN
5           3.0  5.0  3.0  1.0  0.0  0.0  0.0  0.0  NaN    0.0  0.0  2.0  0.0  0.0  0.0  0.0  0.0  NaN
6           3.0  4.0  0.0  1.0  0.0  0.0  0.0  NaN  NaN    3.0  1.0  0.0  0.0  0.0  0.0  0.0  NaN  NaN
7           3.0  3.0  0.0  1.0  0.0  0.0  0.0  NaN  NaN    3.0  1.0  0.0  1.0  0.0  0.0  0.0  NaN  NaN
8           3.0  3.0  0.0  1.0  0.0  0.0  0.0  0.0  0.0    3.0  1.0  0.0  1.0  0.0  0.0  0.0  0.0  0.0
9           3.0  3.0  0.0  1.0  0.0  0.0  0.0  NaN  NaN    3.0  1.0  0.0  1.0  0.0  0.0  0.0  NaN  NaN
10          3.0  3.0  0.0  1.0  0.0  0.0  0.0  0.0  NaN    3.0  1.0  0.0  0.0  0.0  0.0  0.0  0.0  NaN
11          3.0  3.0  0.0  2.0  2.0  NaN  NaN  NaN  NaN    3.0  0.0  0.0  0.0  0.0  NaN  NaN  NaN  NaN
12          3.0  3.0  0.0  1.0  0.0  0.0  0.0  0.0  NaN    3.0  1.0  0.0  0.0  0.0  0.0  0.0  0.0  NaN
13          3.0  3.0  0.0  2.0  2.0  NaN  NaN  NaN  NaN    3.0  1.0  0.0  0.0  0.0  NaN  NaN  NaN  NaN
14          3.0  5.0  0.0  2.0  0.0  0.0  0.0  0.0  0.0    3.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
15          3.0  3.0  0.0  3.0  1.0  NaN  NaN  NaN  NaN    3.0  0.0  0.0  0.0  1.0  NaN  NaN  NaN  NaN

如果要将其与data框结合起来,可以执行以下操作:

stacked = data.stack()
pd.concat({
    'Data': data,
    'Vowels': stacked.str.count('[aeiou]', flags=re.I),
    'Consonant': stacked.str.count('[bcdfghjklmnpqrstvwxzy]', flags=re.I)
}, axis=1).unstack()

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM