检查 substring 的列以分配具有值的新列

Question

This is my dataframe with 2 columns:这是我的 dataframe 有 2 列：

ID      CODES
36233   LEH,PW
6175    N/A
6242    
6680    MS,XL,JFK

In column CODES, I need to identify the comma (",") and then count the number of commas and return it in a dataframe:在 CODES 列中，我需要识别逗号（“，”），然后计算逗号的数量并将其返回到 dataframe 中：

Output: Output：

ID      CODES   HAS COMMA   NO. OF COMMAS
36233   LEH,PW  TRUE        1
6175    N/A     FALSE       0
6242            FALSE       0
6680  MS,XL,JFK TRUE        2

So far I've tried DF['HAS COMMA'] = np.where(DF['CODE'].str.contains(','),True, False) but this returns TRUE where there are blanks.到目前为止，我已经尝试过DF['HAS COMMA'] = np.where(DF['CODE'].str.contains(','),True, False)但这会在有空格的地方返回 TRUE。 :( :(

Additionally DF['NO OF COMMAs']=DF['CODE'].count(",") returns an error.此外DF['NO OF COMMAs']=DF['CODE'].count(",")返回错误。

Answer 1

How about with:怎么样：

df['HAS COMMA'] = df.CODES.str.contains(',').fillna(False)
df['NO. OF COMMA'] =  df.CODES.str.count(',').fillna(0)

prints:印刷：

      ID      CODES  HAS COMMA  NO. OF COMMA
0  36233     LEH,PW       True           1.0
1   6175        N/A      False           0.0
2   6242        NaN      False           0.0
3   6680  MS,XL,JFK       True           2.0

Answer 2

Pandas string methods are not optimized so a Python list comprehension would be more efficient for this task. Pandas 字符串方法未优化，因此 Python 列表理解对于此任务将更有效。 For example, the code below is about 8 times faster than the equivalent pandas str methods for a df with 4k rows.例如，对于具有 4k 行的 df，下面的代码比等效的 pandas str 方法快大约 8 倍。

Simply check if a comma exists in each value of df.CODES and decide whether to count or not.只需检查df.CODES的每个值中是否存在逗号并决定是否计数。

df[['HAS COMMA', 'NO. OF COMMA']] = [[True, s.count(',')] if ',' in s else [False, 0] for s in df['CODES'].tolist()]

检查 substring 的列以分配具有值的新列

问题描述

2 个解决方案

解决方案1
1 已采纳 2022-08-01 13:22:18

解决方案2
1 2022-09-05 02:59:39

检查 substring 的列以分配具有值的新列

问题描述

2 个解决方案

解决方案1 1 已采纳 2022-08-01 13:22:18

解决方案2 1 2022-09-05 02:59:39

解决方案1
1 已采纳 2022-08-01 13:22:18

解决方案2
1 2022-09-05 02:59:39