[英]How do I count the occurrence of string values based on a condition in a dataframe?
I have the following dataframe:我有以下 dataframe:
Event_Type Roster_Designation
4 Assist Male
5 Goal Female
12 Assist Female
13 Goal Male
46 Goal Male
... ... ...
207095 Goal Female
207108 Assist Male
207109 Goal Male
207118 Assist Female
207119 Goal Female
What I want to know is how many Goals are scored by Females that are Assisted by Males?我想知道在男性的协助下女性进了多少球? How many Goals scored by Females are Assisted by Females?有多少女性进球是由女性助攻的? and then vice versa (eg. Female assists -> Male goals, male assists - male goals).然后反之亦然(例如,女性助攻 -> 男性进球,男性助攻 - 男性进球)。
import pandas as pd
# test data. expected result is 1 for each category
data = [['Assist', 'Male'], ['Goal', 'Female'], ['Assist', 'Female'], ['Goal', 'Male'], ['Goal', 'Male'], ['Goal', 'Female'], ['Assist', 'Male'], ['Goal', 'Male'], ['Assist', 'Female'], ['Goal', 'Female']]
df = pd.DataFrame(data, columns=['Event_Type', 'Roster_Designation'])
female_female = len(df[(df['Event_Type'] == 'Goal') & (df['Roster_Designation'] == 'Female') & (df['Roster_Designation'].shift() == 'Female') & (df['Event_Type'].shift() == 'Assist')])
female_male = len(df[(df['Event_Type'] == 'Goal') & (df['Roster_Designation'] == 'Female') & (df['Roster_Designation'].shift() == 'Male') & (df['Event_Type'].shift() == 'Assist')])
male_male = len(df[(df['Event_Type'] == 'Goal') & (df['Roster_Designation'] == 'Male') & (df['Roster_Designation'].shift() == 'Male') & (df['Event_Type'].shift() == 'Assist')])
male_female = len(df[(df['Event_Type'] == 'Goal') & (df['Roster_Designation'] == 'Male') & (df['Roster_Designation'].shift() == 'Female') & (df['Event_Type'].shift() == 'Assist')])
print(f"Female goals assisted by Females: {female_female}")
print(f"Female goals assisted by Males: {female_male}")
print(f"Male goals assisted by Males: {male_male}")
print(f"Male goals assisted by Females: {male_female}")
Output:
Female goals assisted by Females: 1
Female goals assisted by Males: 1
Male goals assisted by Males: 1
Male goals assisted by Females: 1
Explanation of var assignment: var 赋值说明:
We're wrapping everything in a len() so that we are assigning a number as the final value我们将所有内容包装在 len() 中,以便我们分配一个数字作为最终值
The 4 conditions we are checking for (female -> male for example) are:我们正在检查的 4 个条件(例如女性 -> 男性)是:
If all above is yes, then count it.如果以上都是肯定的,那就数一数。
How's this:这个怎么样:
assister_index = pd.Series(
df[df["Event_Type"] == "Goal"].index.to_series() - 1,
index=df.index,
dtype=pd.Int64Dtype(),
)
assister_index[~assister_index.isin(df.index)] = pd.NA
>>> assister_index
4 <NA>
5 4
12 <NA>
13 12
46 <NA>
207095 <NA>
207108 <NA>
207109 207108
207118 <NA>
207119 207118
dtype: Int64
df["Assister_Gender"] = assister_index.map(
lambda i: df.loc[i, "Roster_Designation"], na_action="ignore"
)
>>> df
Event_Type Roster_Designation Assister_Gender
4 Assist Male <NA>
5 Goal Female Male
12 Assist Female <NA>
13 Goal Male Female
46 Goal Male <NA>
207095 Goal Female <NA>
207108 Assist Male <NA>
207109 Goal Male Male
207118 Assist Female <NA>
207119 Goal Female Female
Now just count the rows matching your desired conditions, eg,现在只需计算符合您所需条件的行,例如,
(
(df["Event_Type"] == "Goal")
& (df["Roster_Designation"] == "Female")
& (df["Assister_Gender"] == "Male")
).sum()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.