简体   繁体   English

如何根据 dataframe 中的条件计算字符串值的出现次数?

[英]How do I count the occurrence of string values based on a condition in a dataframe?

I have the following dataframe:我有以下 dataframe:

        Event_Type Roster_Designation
4          Assist               Male
5            Goal             Female
12         Assist             Female
13           Goal               Male
46           Goal               Male
...           ...                ...
207095       Goal             Female
207108     Assist               Male
207109       Goal               Male
207118     Assist             Female
207119       Goal             Female

What I want to know is how many Goals are scored by Females that are Assisted by Males?我想知道在男性的协助下女性进了多少球? How many Goals scored by Females are Assisted by Females?有多少女性进球是由女性助攻的? and then vice versa (eg. Female assists -> Male goals, male assists - male goals).然后反之亦然(例如,女性助攻 -> 男性进球,男性助攻 - 男性进球)。

import pandas as pd

# test data. expected result is 1 for each category
data = [['Assist', 'Male'], ['Goal', 'Female'], ['Assist', 'Female'], ['Goal', 'Male'], ['Goal', 'Male'], ['Goal', 'Female'], ['Assist', 'Male'], ['Goal', 'Male'], ['Assist', 'Female'], ['Goal', 'Female']]
df = pd.DataFrame(data, columns=['Event_Type', 'Roster_Designation'])

female_female = len(df[(df['Event_Type'] == 'Goal') & (df['Roster_Designation'] == 'Female') & (df['Roster_Designation'].shift() == 'Female') & (df['Event_Type'].shift() == 'Assist')])
female_male = len(df[(df['Event_Type'] == 'Goal') & (df['Roster_Designation'] == 'Female') & (df['Roster_Designation'].shift() == 'Male') & (df['Event_Type'].shift() == 'Assist')])
male_male = len(df[(df['Event_Type'] == 'Goal') & (df['Roster_Designation'] == 'Male') & (df['Roster_Designation'].shift() == 'Male') & (df['Event_Type'].shift() == 'Assist')])
male_female = len(df[(df['Event_Type'] == 'Goal') & (df['Roster_Designation'] == 'Male') & (df['Roster_Designation'].shift() == 'Female') & (df['Event_Type'].shift() == 'Assist')])

print(f"Female goals assisted by Females: {female_female}")
print(f"Female goals assisted by Males: {female_male}")
print(f"Male goals assisted by Males: {male_male}")
print(f"Male goals assisted by Females: {male_female}")

Output:
Female goals assisted by Females: 1
Female goals assisted by Males: 1
Male goals assisted by Males: 1
Male goals assisted by Females: 1

Explanation of var assignment: var 赋值说明:

We're wrapping everything in a len() so that we are assigning a number as the final value我们将所有内容包装在 len() 中,以便我们分配一个数字作为最终值

The 4 conditions we are checking for (female -> male for example) are:我们正在检查的 4 个条件(例如女性 -> 男性)是:

  • Is current value for key Event_Type == Goal?关键Event_Type的当前值是否 == 目标?
  • Is current value for key Roster_Designation == Female?关键Roster_Designation的当前值是否 == 女性?
  • Is the previous value(shift) for key Roster_Designation == Male?Roster_Designation == Male 的前一个值(移位)?
  • Is the previous value(shift) for key Event_Type == assist?Event_Type == 辅助的先前值(移位)?

If all above is yes, then count it.如果以上都是肯定的,那就数一数。

How's this:这个怎么样:

assister_index = pd.Series(
    df[df["Event_Type"] == "Goal"].index.to_series() - 1,
    index=df.index,
    dtype=pd.Int64Dtype(),
)
assister_index[~assister_index.isin(df.index)] = pd.NA

>>> assister_index
4           <NA>
5              4
12          <NA>
13            12
46          <NA>
207095      <NA>
207108      <NA>
207109    207108
207118      <NA>
207119    207118
dtype: Int64

df["Assister_Gender"] = assister_index.map(
    lambda i: df.loc[i, "Roster_Designation"], na_action="ignore"
) 
>>> df
       Event_Type Roster_Designation Assister_Gender
4          Assist               Male            <NA>
5            Goal             Female            Male
12         Assist             Female            <NA>
13           Goal               Male          Female
46           Goal               Male            <NA>
207095       Goal             Female            <NA>
207108     Assist               Male            <NA>
207109       Goal               Male            Male
207118     Assist             Female            <NA>
207119       Goal             Female          Female    

Now just count the rows matching your desired conditions, eg,现在只需计算符合您所需条件的行,例如,

(
    (df["Event_Type"] == "Goal")
    & (df["Roster_Designation"] == "Female")
    & (df["Assister_Gender"] == "Male")
).sum()

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何根据 dataframe 中的条件获得具有特定值的系列? - How do I get a series with certain values based on a condition in a dataframe? Pandas DataFrame:在字符串列中查找唯一单词,根据条件计算它们在另一列中的出现和总和值 - Pandas DataFrame: Find unique words in string column, count their occurrence and sum values in another column on condition 如何获取 python 中字符串出现的计数? - How do I get the count of string occurrence in python? 如何根据出现排列 Dataframe 值 - How to Arrange Dataframe values Based on the Occurrence 无法根据带有 groupby 的几列来计算 dataFrame 中值的出现次数 - Failed to count occurrence of values in dataFrame based on few columns with groupby 如何在条件下替换数据框的值 - How do i replace the values of a dataframe on a condition 如何根据另一个 dataframe 中的条件计算一个 dataframe 中的值 - How do I count values in one dataframe based on the conditions in another dataframe 如何根据另一列的日期条件获取熊猫数据框中特定列的值? - How do I get the values of a particular column in a pandas dataframe based on a date condition on another column? 如何根据日期条件使用另一个 dataframe 填充缺失值,但形状不同? - How do I fill in missing values using another dataframe based on date condition, but which are different shapes? 如何根据条件在 Pandas 数据帧上应用字符串拆分方法? - How do I apply the string split method on a pandas dataframe based on a condition?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM