计算拆分字符串中的辅音和元音

Question

我读了一个 .csv 文件。 我有以下数据框，用于计算Description列中字符串中的元音和辅音。 这很好用，但我的问题是我想将Description分成 8 列并计算每列的辅音和元音。 我的代码的第二部分允许我将Description分成 8 列。 我如何计算Description分成的所有 8 列上的元音和辅音？

import pandas as pd
import re

def anti_vowel(s):
    result = re.sub(r'[AEIOU]', '', s, flags=re.IGNORECASE)
    return result

data = pd.read_csv('http://core.secure.ehc.com/src/util/detail-price-list/TristarDivision_SummitMedicalCenter_CM.csv')

data.dropna(inplace = True)

data['Vowels'] = data['Description'].str.count(r'[aeiou]', flags=re.I)
data['Consonant'] = data['Description'].str.count(r'[bcdfghjklmnpqrstvwxzy]', flags=re.I)

print (data)

这是我用来将列Description拆分为 8 列的代码。

import pandas as pd
data = data["Description"].str.split(" ", n = 8, expand = True)
data = pd.read_csv('http://core.secure.ehc.com/src/util/detail-price-list/TristarDivision_SummitMedicalCenter_CM.csv')

data.dropna(inplace = True)

data = data["Description"].str.split(" ", n = 8, expand = True)

print (data)

现在我怎样才能把它们放在一起？

为了读取 8 的每一列并计算辅音，我知道我可以使用以下将 0 替换为 0-7：

testconsonant = data[0].str.count(r'[bcdfghjklmnpqrstvwxzy]', flags=re.I)
testvowel = data[0].str.count(r'[aeiou]', flags=re.I)

期望的输出是：

Description [0] vowel count consonant count Description [1] vowel count consonant count Description [2] vowel count consonant count Description [3] vowel count consonant count Description [4] vowel count consonant count all the way to description [7]

Answer 1

`stack`然后`unstack`

stacked = data.stack()
pd.concat({
    'Vowels': stacked.str.count('[aeiou]', flags=re.I),
    'Consonant': stacked.str.count('[bcdfghjklmnpqrstvwxzy]', flags=re.I)
}, axis=1).unstack()

      Consonant                                         Vowels                                        
              0    1    2    3    4    5    6    7    8      0    1    2    3    4    5    6    7    8
0           3.0  5.0  5.0  1.0  2.0  NaN  NaN  NaN  NaN    1.0  0.0  0.0  0.0  0.0  NaN  NaN  NaN  NaN
1           8.0  5.0  1.0  0.0  0.0  0.0  0.0  0.0  NaN    0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  NaN
2           8.0  5.0  1.0  0.0  0.0  0.0  0.0  0.0  NaN    0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  NaN
3           8.0  5.0  1.0  0.0  0.0  0.0  0.0  0.0  NaN    0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  NaN
4           3.0  5.0  3.0  1.0  0.0  0.0  0.0  0.0  NaN    0.0  0.0  2.0  0.0  0.0  0.0  0.0  0.0  NaN
5           3.0  5.0  3.0  1.0  0.0  0.0  0.0  0.0  NaN    0.0  0.0  2.0  0.0  0.0  0.0  0.0  0.0  NaN
6           3.0  4.0  0.0  1.0  0.0  0.0  0.0  NaN  NaN    3.0  1.0  0.0  0.0  0.0  0.0  0.0  NaN  NaN
7           3.0  3.0  0.0  1.0  0.0  0.0  0.0  NaN  NaN    3.0  1.0  0.0  1.0  0.0  0.0  0.0  NaN  NaN
8           3.0  3.0  0.0  1.0  0.0  0.0  0.0  0.0  0.0    3.0  1.0  0.0  1.0  0.0  0.0  0.0  0.0  0.0
9           3.0  3.0  0.0  1.0  0.0  0.0  0.0  NaN  NaN    3.0  1.0  0.0  1.0  0.0  0.0  0.0  NaN  NaN
10          3.0  3.0  0.0  1.0  0.0  0.0  0.0  0.0  NaN    3.0  1.0  0.0  0.0  0.0  0.0  0.0  0.0  NaN
11          3.0  3.0  0.0  2.0  2.0  NaN  NaN  NaN  NaN    3.0  0.0  0.0  0.0  0.0  NaN  NaN  NaN  NaN
12          3.0  3.0  0.0  1.0  0.0  0.0  0.0  0.0  NaN    3.0  1.0  0.0  0.0  0.0  0.0  0.0  0.0  NaN
13          3.0  3.0  0.0  2.0  2.0  NaN  NaN  NaN  NaN    3.0  1.0  0.0  0.0  0.0  NaN  NaN  NaN  NaN
14          3.0  5.0  0.0  2.0  0.0  0.0  0.0  0.0  0.0    3.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
15          3.0  3.0  0.0  3.0  1.0  NaN  NaN  NaN  NaN    3.0  0.0  0.0  0.0  1.0  NaN  NaN  NaN  NaN

如果要将其与data框结合起来，可以执行以下操作：

stacked = data.stack()
pd.concat({
    'Data': data,
    'Vowels': stacked.str.count('[aeiou]', flags=re.I),
    'Consonant': stacked.str.count('[bcdfghjklmnpqrstvwxzy]', flags=re.I)
}, axis=1).unstack()

计算拆分字符串中的辅音和元音

问题描述

1 个解决方案

解决方案1
3 已采纳 2019-07-02 02:57:23

`stack`然后`unstack`

计算拆分字符串中的辅音和元音

问题描述

1 个解决方案

解决方案1 3 已采纳 2019-07-02 02:57:23

stack然后unstack

解决方案1
3 已采纳 2019-07-02 02:57:23

`stack`然后`unstack`