简体   繁体   English

Python Pandas Dataframe将列创建为另一列中出现的字符串数

[英]Python Pandas Dataframe create column as number of occurrence of string in another columns

I have a dataframe and I want to count how many times a string (say 'Yes') has occurred in all other columns. 我有一个数据框,我想计算一个字符串在所有其他列中出现了多少次(例如“是”)。 I want to add count into new column and call it 'Yes-Count'. 我想将计数添加到新列中,并将其称为“是计数”。

I have it working using lamda and following example Creating a new column based on if-elif-else condition 我使用lamda和下面的示例进行工作, 基于if-elif-else条件创建新列

I am curious if this can be done in one line. 我很好奇是否可以一行完成。

This is sample data and code. 这是示例数据和代码。

import pandas as pd

def finalCount(row):
    count = 0
    if row['Col1'] == 'Yes':
        count = count + 1 
    if row['Col2'] == 'Yes':
        count = count + 1 
    if row['Col3'] == 'Yes':
        count = count + 1
    if row['Col4'] == 'Yes':
        count = count + 1
    return count

data = {
         'Col1': ['Yes', 1, 'No', 'Yes'],
         'Col2': ['Yes', 2, 'No', 'Yes'],
         'Col3': ['No', 3, 'Yes', 'Yes'],
         'Col4': ['Yes', 4, 'No', 'Yes'],
    }
dfData = pd.DataFrame(data, columns= ['Col1','Col2','Col3','Col4'])
dfData['Yes-Count'] = dfData.apply(finalCount, axis =1)

I get result as expected. 我得到预期的结果。

在此处输入图片说明

Is there a way to get rid of finalCount method and do this in one line? 有没有一种方法可以摆脱finalCount方法,而只需一行呢?

Here's one way using a boolean mask and sum: 这是使用布尔掩码和求和的一种方法:

dfData["Yes-Count"] = dfData.eq('Yes').sum(axis=1)
print(dfData)
#  Col1 Col2 Col3 Col4  Yes-Count
#0  Yes  Yes   No  Yes          3
#1    1    2    3    4          0
#2   No   No  Yes   No          1
#3  Yes  Yes  Yes  Yes          4

Explanation 说明

  • dfData.eq("Yes") returns a dataframe of equal shape with boolean values representing if the value in that location is equal to "Yes" dfData.eq("Yes")返回具有相等形状的数据dfData.eq("Yes") ,该布尔值表示该位置的值是否等于"Yes"
  • Sum these across the columns (axis=1) 跨列求和(轴= 1)
  • Assign the output back as a new column 将输出分配回新列

Here is another approach using the isin() function: 这是使用isin()函数的另一种方法:

list_of_words = ['Yes']
dfData["Yes-Count"] = dfData.isin(list_of_words).sum(axis='columns')

Using this approach you can compare your DataFrame elements with multiple values. 使用这种方法,您可以将DataFrame元素与多个值进行比较。 The isin() function returns a boolean DataFrame which shows whether your DataFrame elements match to any of the words in list_of_words . DataFrame isin()函数返回一个布尔型DataFrame ,它显示您的DataFrame元素是否与list_of_words任何单词匹配。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在另一个数据帧 python pandas 中的多列上使用条件逻辑在数据帧中创建一列? - How can I create a column in a dataframe using conditional logic on multiple columns in another dataframe python pandas? pandas:通过将 DataFrame 行与另一个 DataFrame 的列进行比较来创建新列 - pandas: Create new column by comparing DataFrame rows with columns of another DataFrame Python数据框:基于另一列创建列 - Python Dataframe: Create columns based on another column Pandas DataFrame:在字符串列中查找唯一单词,根据条件计算它们在另一列中的出现和总和值 - Pandas DataFrame: Find unique words in string column, count their occurrence and sum values in another column on condition Python 统计数据帧列中某个值出现的次数 - Python count number of occurrence of a value in a dataframe column 基于2个分类列pandas数据框创建新的增量列 - create new column of incremental number based on 2 categorical columns pandas dataframe 如何创建一个列来标识行数,直到下一次使用熊猫在另一列中出现值? - How to create a column that identifies the number of rows until the next occurrence of a value in another column with pandas? Pandas Dataframe 更新列基于将其他一些列与另一个具有不同列数的 dataframe 的列进行比较 - Pandas Dataframe updating a column based comparing some other columns with the columns of another dataframe with different number of columns 遍历列 pandas dataframe 并根据条件创建另一列 - iterate through columns pandas dataframe and create another column based on a condition 将多列值的出现次数汇总为熊猫数据框 - Sumarize the occurrence number of multiple columns values as a pandas dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM