用熊猫计算列中每行的字符串数

Question

Inside my dataframe:在我的数据框中：

no    pre_code
1     23, 234, 345
2     234, 345
3     23
4     NaN

I want to count number of string inside pre_code columns, What I have tried so far was:我想计算pre_code列中的字符串数，到目前为止我尝试过的是：

df['count'] = df['pre_code'].astype('str').str.split(',').str.len().fillna(0)

but with the code above, it counts NaN as 1. So, I dont get the desired results.但是对于上面的代码，它将 NaN 计为 1。所以，我没有得到想要的结果。

Before, I also have tried this way:之前，我也试过这种方式：

df['count'] = df['pre_code'].str.count(',').add(1).fillna(0)

Unfortunately, the code above also did not work on my dataframe.不幸的是，上面的代码也不适用于我的数据帧。 It gives me 0 for the single entry string.它为单个条目字符串提供了 0。 For your information, I have 2200 rows on my dataframe, and somehow the code could not work perfectly for those number of rows.供您参考，我的数据框中有 2200 行，不知何故，代码无法完美地处理这些行数。 When I tried for only 5 rows, somehow it worked well.当我只尝试 5 行时，不知何故它运行良好。

I expect the result would be like:我希望结果是这样的：

no    pre_code         count
1     23, 234, 345       3
2     234, 345           2
3     23                 1
4     NaN                0

any solution for my case?我的情况有什么解决方案吗？

thanks in advance.提前致谢。

Answer 1

I think you need nan like np.nan instead string nan , then both solutions working correct:我认为你需要nan喜欢np.nan而不是字符串nan ，那么这两个解决方案工作正确的：

You need test how looks values without numbers for replacement:您需要测试没有替换数字的值的外观：

print (df.loc[~df['pre_code'].str.contains('\d'), 'pre_code'].unique().tolist())
['nan']

df['count'] = df['pre_code'].replace('nan', np.nan).str.split(',').str.len().fillna(0)

Or:或者：

df['count'] = df['pre_code'].replace('nan', np.nan).str.count(',').add(1).fillna(0)

print (df)
   no      pre_code  count
0   1  23, 234, 345    3.0
1   2      234, 345    2.0
2   3            23    1.0
3   4           NaN    0.0

EDIT:编辑：

EDIT: More general solution is convert values without numbers to NaN in Series.where with Series.str.contains :编辑：更通用的解决方案是将没有数字的值转换为Series.where NaN和Series.str.contains ：

df['count'] = (df['pre_code'].where(df['pre_code'].str.contains('\d', na=False))
                             .str.count(',')
                             .add(1)
                             .fillna(0)
                             .astype(int))
print (df)
   no      pre_code  count
0   1  23, 234, 345      3
1   2      234, 345      2
2   3            23      1
3   4           NaN      0

Answer 2

Try:尝试：

df['count'] = df.loc[df['pre_code'].notna(), 'pre_code'] \
                .astype(str).str.split(',').str.len() \
                .reindex(df.index, fill_value=0)

print(df)

# Output:
   no      pre_code  count
0   1  23, 234, 345      3
1   2      234, 345      2
2   3            23      1
3   4           NaN      0

I'm not sure you have to convert to str (`astype(str)).我不确定您是否必须转换为 str (`astype(str))。

用熊猫计算列中每行的字符串数

问题描述

2 个解决方案

解决方案1
1 2021-10-21 06:04:53

解决方案2
0 2021-10-21 06:44:28

用熊猫计算列中每行的字符串数

问题描述

2 个解决方案

解决方案1 1 2021-10-21 06:04:53

解决方案2 0 2021-10-21 06:44:28

解决方案1
1 2021-10-21 06:04:53

解决方案2
0 2021-10-21 06:44:28