简体   繁体   English

计算大熊猫中每组以逗号分隔的字符串中的所有单词

[英]Count all words in comma separated strings per group in pandas

i would like to count schools(separated by commas) from the data frame given below. 我想从下面给出的数据框中计算学校(用逗号分隔)。

Dataframe: 数据帧:

State    Counties    Schools_list
S1       C1          GradeA,GradeB,GradeC
S1       C1          GradeD
S2       C1          GradeA,GradeB
S2       C2          GradeC
S3       C2          GradeA,GradeB
S3       C3          GradeC,GradeD

Output: 输出:

State          Schools_count
S1             4
S2             3
S3             4

How to count comma separated list of schools from last column by State. 如何按州计算逗号分隔的学校列表和最后一列的学校。

A simple solution here would be to count the commas: 一个简单的解决方案是计算逗号:

df['Schools_list'].str.count(',').add(1).groupby(df.State).sum()

State
S1    4
S2    3
S3    4
Name: Schools_list, dtype: int64

Note that, once you have counted the commas, group on the State name to get the count by state. 请注意,计算完逗号后,请按州名称分组,以按州进行计数。

As a DataFrame, 作为一个DataFrame,

(df['Schools_list'].str.count(',')
                   .add(1)
                   .groupby(df.State)
                   .sum()
                   .reset_index(name='Schools_count'))

  State  Schools_count
0    S1              4
1    S2              3
2    S3              4

You can also split on comma and find the length of the lists created, but this is a bit slower. 您也可以使用逗号分割并找到所创建列表的长度,但这要慢一些。

df['Schools_list'].str.split(',+').str.len().groupby(df.State).sum()

State
S1    4
S2    3
S3    4
Name: Schools_list, dtype: int64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM