[英]How do I get a series with certain values based on a condition in a dataframe?
[英]How do I count the occurrence of string values based on a condition in a dataframe?
我有以下 dataframe:
Event_Type Roster_Designation
4 Assist Male
5 Goal Female
12 Assist Female
13 Goal Male
46 Goal Male
... ... ...
207095 Goal Female
207108 Assist Male
207109 Goal Male
207118 Assist Female
207119 Goal Female
我想知道在男性的协助下女性进了多少球? 有多少女性进球是由女性助攻的? 然后反之亦然(例如,女性助攻 -> 男性进球,男性助攻 - 男性进球)。
import pandas as pd
# test data. expected result is 1 for each category
data = [['Assist', 'Male'], ['Goal', 'Female'], ['Assist', 'Female'], ['Goal', 'Male'], ['Goal', 'Male'], ['Goal', 'Female'], ['Assist', 'Male'], ['Goal', 'Male'], ['Assist', 'Female'], ['Goal', 'Female']]
df = pd.DataFrame(data, columns=['Event_Type', 'Roster_Designation'])
female_female = len(df[(df['Event_Type'] == 'Goal') & (df['Roster_Designation'] == 'Female') & (df['Roster_Designation'].shift() == 'Female') & (df['Event_Type'].shift() == 'Assist')])
female_male = len(df[(df['Event_Type'] == 'Goal') & (df['Roster_Designation'] == 'Female') & (df['Roster_Designation'].shift() == 'Male') & (df['Event_Type'].shift() == 'Assist')])
male_male = len(df[(df['Event_Type'] == 'Goal') & (df['Roster_Designation'] == 'Male') & (df['Roster_Designation'].shift() == 'Male') & (df['Event_Type'].shift() == 'Assist')])
male_female = len(df[(df['Event_Type'] == 'Goal') & (df['Roster_Designation'] == 'Male') & (df['Roster_Designation'].shift() == 'Female') & (df['Event_Type'].shift() == 'Assist')])
print(f"Female goals assisted by Females: {female_female}")
print(f"Female goals assisted by Males: {female_male}")
print(f"Male goals assisted by Males: {male_male}")
print(f"Male goals assisted by Females: {male_female}")
Output:
Female goals assisted by Females: 1
Female goals assisted by Males: 1
Male goals assisted by Males: 1
Male goals assisted by Females: 1
var 赋值说明:
我们将所有内容包装在 len() 中,以便我们分配一个数字作为最终值
我们正在检查的 4 个条件(例如女性 -> 男性)是:
如果以上都是肯定的,那就数一数。
这个怎么样:
assister_index = pd.Series(
df[df["Event_Type"] == "Goal"].index.to_series() - 1,
index=df.index,
dtype=pd.Int64Dtype(),
)
assister_index[~assister_index.isin(df.index)] = pd.NA
>>> assister_index
4 <NA>
5 4
12 <NA>
13 12
46 <NA>
207095 <NA>
207108 <NA>
207109 207108
207118 <NA>
207119 207118
dtype: Int64
df["Assister_Gender"] = assister_index.map(
lambda i: df.loc[i, "Roster_Designation"], na_action="ignore"
)
>>> df
Event_Type Roster_Designation Assister_Gender
4 Assist Male <NA>
5 Goal Female Male
12 Assist Female <NA>
13 Goal Male Female
46 Goal Male <NA>
207095 Goal Female <NA>
207108 Assist Male <NA>
207109 Goal Male Male
207118 Assist Female <NA>
207119 Goal Female Female
现在只需计算符合您所需条件的行,例如,
(
(df["Event_Type"] == "Goal")
& (df["Roster_Designation"] == "Female")
& (df["Assister_Gender"] == "Male")
).sum()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.