[英]Check a column for substring to assign new columns with values
This is my dataframe with 2 columns:这是我的 dataframe 有 2 列:
ID CODES
36233 LEH,PW
6175 N/A
6242
6680 MS,XL,JFK
In column CODES, I need to identify the comma (",") and then count the number of commas and return it in a dataframe:在 CODES 列中,我需要识别逗号(“,”),然后计算逗号的数量并将其返回到 dataframe 中:
Output: Output:
ID CODES HAS COMMA NO. OF COMMAS
36233 LEH,PW TRUE 1
6175 N/A FALSE 0
6242 FALSE 0
6680 MS,XL,JFK TRUE 2
So far I've tried DF['HAS COMMA'] = np.where(DF['CODE'].str.contains(','),True, False)
but this returns TRUE where there are blanks.到目前为止,我已经尝试过
DF['HAS COMMA'] = np.where(DF['CODE'].str.contains(','),True, False)
但这会在有空格的地方返回 TRUE。 :( :(
Additionally DF['NO OF COMMAs']=DF['CODE'].count(",")
returns an error.此外
DF['NO OF COMMAs']=DF['CODE'].count(",")
返回错误。
How about with:怎么样:
df['HAS COMMA'] = df.CODES.str.contains(',').fillna(False)
df['NO. OF COMMA'] = df.CODES.str.count(',').fillna(0)
prints:印刷:
ID CODES HAS COMMA NO. OF COMMA
0 36233 LEH,PW True 1.0
1 6175 N/A False 0.0
2 6242 NaN False 0.0
3 6680 MS,XL,JFK True 2.0
Pandas string methods are not optimized so a Python list comprehension would be more efficient for this task. Pandas 字符串方法未优化,因此 Python 列表理解对于此任务将更有效。 For example, the code below is about 8 times faster than the equivalent pandas str methods for a df with 4k rows.
例如,对于具有 4k 行的 df,下面的代码比等效的 pandas str 方法快大约 8 倍。
Simply check if a comma exists in each value of df.CODES
and decide whether to count or not.只需检查
df.CODES
的每个值中是否存在逗号并决定是否计数。
df[['HAS COMMA', 'NO. OF COMMA']] = [[True, s.count(',')] if ',' in s else [False, 0] for s in df['CODES'].tolist()]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.