如何根据 dataframe 中的条件计算字符串值的出现次数？

Question

我有以下 dataframe：

        Event_Type Roster_Designation
4          Assist               Male
5            Goal             Female
12         Assist             Female
13           Goal               Male
46           Goal               Male
...           ...                ...
207095       Goal             Female
207108     Assist               Male
207109       Goal               Male
207118     Assist             Female
207119       Goal             Female

我想知道在男性的协助下女性进了多少球？ 有多少女性进球是由女性助攻的？ 然后反之亦然（例如，女性助攻 -> 男性进球，男性助攻 - 男性进球）。

Answer 1

import pandas as pd

# test data. expected result is 1 for each category
data = [['Assist', 'Male'], ['Goal', 'Female'], ['Assist', 'Female'], ['Goal', 'Male'], ['Goal', 'Male'], ['Goal', 'Female'], ['Assist', 'Male'], ['Goal', 'Male'], ['Assist', 'Female'], ['Goal', 'Female']]
df = pd.DataFrame(data, columns=['Event_Type', 'Roster_Designation'])

female_female = len(df[(df['Event_Type'] == 'Goal') & (df['Roster_Designation'] == 'Female') & (df['Roster_Designation'].shift() == 'Female') & (df['Event_Type'].shift() == 'Assist')])
female_male = len(df[(df['Event_Type'] == 'Goal') & (df['Roster_Designation'] == 'Female') & (df['Roster_Designation'].shift() == 'Male') & (df['Event_Type'].shift() == 'Assist')])
male_male = len(df[(df['Event_Type'] == 'Goal') & (df['Roster_Designation'] == 'Male') & (df['Roster_Designation'].shift() == 'Male') & (df['Event_Type'].shift() == 'Assist')])
male_female = len(df[(df['Event_Type'] == 'Goal') & (df['Roster_Designation'] == 'Male') & (df['Roster_Designation'].shift() == 'Female') & (df['Event_Type'].shift() == 'Assist')])

print(f"Female goals assisted by Females: {female_female}")
print(f"Female goals assisted by Males: {female_male}")
print(f"Male goals assisted by Males: {male_male}")
print(f"Male goals assisted by Females: {male_female}")

Output:
Female goals assisted by Females: 1
Female goals assisted by Males: 1
Male goals assisted by Males: 1
Male goals assisted by Females: 1

var 赋值说明：

我们将所有内容包装在 len() 中，以便我们分配一个数字作为最终值

我们正在检查的 4 个条件（例如女性 -> 男性）是：

关键Event_Type的当前值是否 == 目标？
关键Roster_Designation的当前值是否 == 女性？
键Roster_Designation == Male 的前一个值（移位）？
键Event_Type == 辅助的先前值（移位）？

如果以上都是肯定的，那就数一数。

Answer 2

这个怎么样：

assister_index = pd.Series(
    df[df["Event_Type"] == "Goal"].index.to_series() - 1,
    index=df.index,
    dtype=pd.Int64Dtype(),
)
assister_index[~assister_index.isin(df.index)] = pd.NA

>>> assister_index
4           <NA>
5              4
12          <NA>
13            12
46          <NA>
207095      <NA>
207108      <NA>
207109    207108
207118      <NA>
207119    207118
dtype: Int64

df["Assister_Gender"] = assister_index.map(
    lambda i: df.loc[i, "Roster_Designation"], na_action="ignore"
) 
>>> df
       Event_Type Roster_Designation Assister_Gender
4          Assist               Male            <NA>
5            Goal             Female            Male
12         Assist             Female            <NA>
13           Goal               Male          Female
46           Goal               Male            <NA>
207095       Goal             Female            <NA>
207108     Assist               Male            <NA>
207109       Goal               Male            Male
207118     Assist             Female            <NA>
207119       Goal             Female          Female

现在只需计算符合您所需条件的行，例如，

(
    (df["Event_Type"] == "Goal")
    & (df["Roster_Designation"] == "Female")
    & (df["Assister_Gender"] == "Male")
).sum()

如何根据 dataframe 中的条件计算字符串值的出现次数？

问题描述

2 个解决方案

解决方案1
0 已采纳 2021-03-16 18:36:59

解决方案2
0 2021-03-16 19:05:37

如何根据 dataframe 中的条件计算字符串值的出现次数？

问题描述

2 个解决方案

解决方案1 0 已采纳 2021-03-16 18:36:59

解决方案2 0 2021-03-16 19:05:37

解决方案1
0 已采纳 2021-03-16 18:36:59

解决方案2
0 2021-03-16 19:05:37