简体   繁体   English

检查 substring 的列以分配具有值的新列

[英]Check a column for substring to assign new columns with values

This is my dataframe with 2 columns:这是我的 dataframe 有 2 列:

ID      CODES
36233   LEH,PW
6175    N/A
6242    
6680    MS,XL,JFK

In column CODES, I need to identify the comma (",") and then count the number of commas and return it in a dataframe:在 CODES 列中,我需要识别逗号(“,”),然后计算逗号的数量并将其返回到 dataframe 中:

Output: Output:

ID      CODES   HAS COMMA   NO. OF COMMAS
36233   LEH,PW  TRUE        1
6175    N/A     FALSE       0
6242            FALSE       0
6680  MS,XL,JFK TRUE        2

So far I've tried DF['HAS COMMA'] = np.where(DF['CODE'].str.contains(','),True, False) but this returns TRUE where there are blanks.到目前为止,我已经尝试过DF['HAS COMMA'] = np.where(DF['CODE'].str.contains(','),True, False)但这会在有空格的地方返回 TRUE。 :( :(

Additionally DF['NO OF COMMAs']=DF['CODE'].count(",") returns an error.此外DF['NO OF COMMAs']=DF['CODE'].count(",")返回错误。

How about with:怎么样:

df['HAS COMMA'] = df.CODES.str.contains(',').fillna(False)
df['NO. OF COMMA'] =  df.CODES.str.count(',').fillna(0)

prints:印刷:

      ID      CODES  HAS COMMA  NO. OF COMMA
0  36233     LEH,PW       True           1.0
1   6175        N/A      False           0.0
2   6242        NaN      False           0.0
3   6680  MS,XL,JFK       True           2.0

Pandas string methods are not optimized so a Python list comprehension would be more efficient for this task. Pandas 字符串方法未优化,因此 Python 列表理解对于此任务将更有效。 For example, the code below is about 8 times faster than the equivalent pandas str methods for a df with 4k rows.例如,对于具有 4k 行的 df,下面的代码比等效的 pandas str 方法快大约 8 倍。

Simply check if a comma exists in each value of df.CODES and decide whether to count or not.只需检查df.CODES的每个值中是否存在逗号并决定是否计数。

df[['HAS COMMA', 'NO. OF COMMA']] = [[True, s.count(',')] if ',' in s else [False, 0] for s in df['CODES'].tolist()]

结果

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从子字符串在列中分配值 - Assign Values in Column from Substring 根据其他列在新列中分配值(多个匹配与合并) - Assign values in a new column based on other columns (multiple matches with merge) 检查行值是否与列表中的值之一匹配,然后在新列中分配 1/0 - Check if row value match one of values in a list, then assign 1/0 in a new column 当列表值与Pyspark数据框中的列值的子字符串匹配时,填充新列 - Populate new columns when list values match substring of column values in Pyspark dataframe 如何提取新的子字符串作为列列 - How to extract new substring as column columns 如果值相同,如何检查 3 列是否相同并添加一个具有该值的新列? - How to check if 3 columns are same and add a new column with the value if the values are same? Python Pandas-检查子字符串是否包含并将新列设置为子字符串 - Python Pandas - check for substring containment and set new column to substring pandas 按所有值小于特定数字的列值分组,并将组号分配为新列 - pandas group by the column values with all values less than certain numbers and assign the group numbers as new columns 将 excel 中特定列中的唯一值转换为新列,并使用 python 为其他现有列中的这些列分配值 - Converting unique values in a particular column in excel to new columns and assign values for those columns from another exiting columns using python 如何检查PANDAS DataFrame列中是否包含一系列字符串,并将该字符串分配为行中的新列? - How to check if a series of strings is contained in a PANDAS DataFrame columns and assign that string as a new column in the row?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM