如何根据 dataframe 中的条件计算字符串值的出现次数？

Question

I have the following dataframe:我有以下 dataframe：

        Event_Type Roster_Designation
4          Assist               Male
5            Goal             Female
12         Assist             Female
13           Goal               Male
46           Goal               Male
...           ...                ...
207095       Goal             Female
207108     Assist               Male
207109       Goal               Male
207118     Assist             Female
207119       Goal             Female

What I want to know is how many Goals are scored by Females that are Assisted by Males?我想知道在男性的协助下女性进了多少球？ How many Goals scored by Females are Assisted by Females?有多少女性进球是由女性助攻的？ and then vice versa (eg. Female assists -> Male goals, male assists - male goals).然后反之亦然（例如，女性助攻 -> 男性进球，男性助攻 - 男性进球）。

Answer 1

import pandas as pd

# test data. expected result is 1 for each category
data = [['Assist', 'Male'], ['Goal', 'Female'], ['Assist', 'Female'], ['Goal', 'Male'], ['Goal', 'Male'], ['Goal', 'Female'], ['Assist', 'Male'], ['Goal', 'Male'], ['Assist', 'Female'], ['Goal', 'Female']]
df = pd.DataFrame(data, columns=['Event_Type', 'Roster_Designation'])

female_female = len(df[(df['Event_Type'] == 'Goal') & (df['Roster_Designation'] == 'Female') & (df['Roster_Designation'].shift() == 'Female') & (df['Event_Type'].shift() == 'Assist')])
female_male = len(df[(df['Event_Type'] == 'Goal') & (df['Roster_Designation'] == 'Female') & (df['Roster_Designation'].shift() == 'Male') & (df['Event_Type'].shift() == 'Assist')])
male_male = len(df[(df['Event_Type'] == 'Goal') & (df['Roster_Designation'] == 'Male') & (df['Roster_Designation'].shift() == 'Male') & (df['Event_Type'].shift() == 'Assist')])
male_female = len(df[(df['Event_Type'] == 'Goal') & (df['Roster_Designation'] == 'Male') & (df['Roster_Designation'].shift() == 'Female') & (df['Event_Type'].shift() == 'Assist')])

print(f"Female goals assisted by Females: {female_female}")
print(f"Female goals assisted by Males: {female_male}")
print(f"Male goals assisted by Males: {male_male}")
print(f"Male goals assisted by Females: {male_female}")

Output:
Female goals assisted by Females: 1
Female goals assisted by Males: 1
Male goals assisted by Males: 1
Male goals assisted by Females: 1

Explanation of var assignment: var 赋值说明：

We're wrapping everything in a len() so that we are assigning a number as the final value我们将所有内容包装在 len() 中，以便我们分配一个数字作为最终值

The 4 conditions we are checking for (female -> male for example) are:我们正在检查的 4 个条件（例如女性 -> 男性）是：

Is current value for key Event_Type == Goal?关键Event_Type的当前值是否 == 目标？
Is current value for key Roster_Designation == Female?关键Roster_Designation的当前值是否 == 女性？
Is the previous value(shift) for key Roster_Designation == Male?键Roster_Designation == Male 的前一个值（移位）？
Is the previous value(shift) for key Event_Type == assist?键Event_Type == 辅助的先前值（移位）？

If all above is yes, then count it.如果以上都是肯定的，那就数一数。

Answer 2

How's this:这个怎么样：

assister_index = pd.Series(
    df[df["Event_Type"] == "Goal"].index.to_series() - 1,
    index=df.index,
    dtype=pd.Int64Dtype(),
)
assister_index[~assister_index.isin(df.index)] = pd.NA

>>> assister_index
4           <NA>
5              4
12          <NA>
13            12
46          <NA>
207095      <NA>
207108      <NA>
207109    207108
207118      <NA>
207119    207118
dtype: Int64

df["Assister_Gender"] = assister_index.map(
    lambda i: df.loc[i, "Roster_Designation"], na_action="ignore"
) 
>>> df
       Event_Type Roster_Designation Assister_Gender
4          Assist               Male            <NA>
5            Goal             Female            Male
12         Assist             Female            <NA>
13           Goal               Male          Female
46           Goal               Male            <NA>
207095       Goal             Female            <NA>
207108     Assist               Male            <NA>
207109       Goal               Male            Male
207118     Assist             Female            <NA>
207119       Goal             Female          Female

Now just count the rows matching your desired conditions, eg,现在只需计算符合您所需条件的行，例如，

(
    (df["Event_Type"] == "Goal")
    & (df["Roster_Designation"] == "Female")
    & (df["Assister_Gender"] == "Male")
).sum()

如何根据 dataframe 中的条件计算字符串值的出现次数？

问题描述

2 个解决方案

解决方案1
0 已采纳 2021-03-16 18:36:59

解决方案2
0 2021-03-16 19:05:37

如何根据 dataframe 中的条件计算字符串值的出现次数？

问题描述

2 个解决方案

解决方案1 0 已采纳 2021-03-16 18:36:59

解决方案2 0 2021-03-16 19:05:37

解决方案1
0 已采纳 2021-03-16 18:36:59

解决方案2
0 2021-03-16 19:05:37