计算Python中一列每行的汉字数

Question

Given a dataframe as follows:给定一个数据框，如下所示：

   id            name
0   1             个体户
1   2              个人
2   3  利他润己企业管理有限公司
3   4    博通国际投资有限公司
4   5      西潼·科技有限公司
5   6      度咪科技有限公司

How could I count the numbers of chinese characters for each row of name column?我如何计算name列每行的汉字数？

The expected result will be like this:预期的结果将是这样的：

   id            name           count
0   1             个体户            3
1   2              个人             2
2   3    利他润己企业管理有限公司    12
3   4      博通国际投资有限公司      10
4   5        西潼科技有限公司        8
5   6        度咪科技有限公司        8

Answer 1

You can use str.count to do this together with a regex pattern:您可以使用str.count与正则表达式模式一起执行此操作：

df['count'] = df['name'].str.count(pat='[\u4e00-\u9fff]')

Result:结果：

   id                    name   count
0   1                   个体户      3
1   2                    个人       2
2   3  利他润己企业管理有限公司      12
3   4      博通国际投资有限公司      10
4   5        西潼·科技有限公司       8
5   6         度咪科技有限公司       8

Answer 2

The following code works, but it will be appreciated if you could share other possible solutions.以下代码有效，但如果您能分享其他可能的解决方案，我们将不胜感激。

def hans_count(str):
    hans_total = 0
    for s in str:
        if '\u4e00' <= s <= '\u9fef':
            hans_total += 1
    return hans_total

df['count'] = df['name'].apply(hans_count)
df

Out:出去：

   id            name  count
0   1             个体户      3
1   2              个人      2
2   3    利他润己企业管理有限公司     12
3   4      博通国际投资有限公司     10
4   5        西潼科技有限公司     8
5   6        度咪科技有限公司     8

计算Python中一列每行的汉字数

问题描述

2 个解决方案

解决方案1
6 已采纳 2020-12-28 09:37:57

解决方案2
0 2020-12-28 09:17:36

计算Python中一列每行的汉字数

问题描述

2 个解决方案

解决方案1 6 已采纳 2020-12-28 09:37:57

解决方案2 0 2020-12-28 09:17:36

解决方案1
6 已采纳 2020-12-28 09:37:57

解决方案2
0 2020-12-28 09:17:36