[英]How do I get a series with certain values based on a condition in a dataframe?
[英]How do I count the occurrence of string values based on a condition in a dataframe?
我有以下 dataframe:
Event_Type Roster_Designation
4 Assist Male
5 Goal Female
12 Assist Female
13 Goal Male
46 Goal Male
... ... ...
207095 Goal Female
207108 Assist Male
207109 Goal Male
207118 Assist Female
207119 Goal Female
我想知道在男性的協助下女性進了多少球? 有多少女性進球是由女性助攻的? 然后反之亦然(例如,女性助攻 -> 男性進球,男性助攻 - 男性進球)。
import pandas as pd
# test data. expected result is 1 for each category
data = [['Assist', 'Male'], ['Goal', 'Female'], ['Assist', 'Female'], ['Goal', 'Male'], ['Goal', 'Male'], ['Goal', 'Female'], ['Assist', 'Male'], ['Goal', 'Male'], ['Assist', 'Female'], ['Goal', 'Female']]
df = pd.DataFrame(data, columns=['Event_Type', 'Roster_Designation'])
female_female = len(df[(df['Event_Type'] == 'Goal') & (df['Roster_Designation'] == 'Female') & (df['Roster_Designation'].shift() == 'Female') & (df['Event_Type'].shift() == 'Assist')])
female_male = len(df[(df['Event_Type'] == 'Goal') & (df['Roster_Designation'] == 'Female') & (df['Roster_Designation'].shift() == 'Male') & (df['Event_Type'].shift() == 'Assist')])
male_male = len(df[(df['Event_Type'] == 'Goal') & (df['Roster_Designation'] == 'Male') & (df['Roster_Designation'].shift() == 'Male') & (df['Event_Type'].shift() == 'Assist')])
male_female = len(df[(df['Event_Type'] == 'Goal') & (df['Roster_Designation'] == 'Male') & (df['Roster_Designation'].shift() == 'Female') & (df['Event_Type'].shift() == 'Assist')])
print(f"Female goals assisted by Females: {female_female}")
print(f"Female goals assisted by Males: {female_male}")
print(f"Male goals assisted by Males: {male_male}")
print(f"Male goals assisted by Females: {male_female}")
Output:
Female goals assisted by Females: 1
Female goals assisted by Males: 1
Male goals assisted by Males: 1
Male goals assisted by Females: 1
var 賦值說明:
我們將所有內容包裝在 len() 中,以便我們分配一個數字作為最終值
我們正在檢查的 4 個條件(例如女性 -> 男性)是:
如果以上都是肯定的,那就數一數。
這個怎么樣:
assister_index = pd.Series(
df[df["Event_Type"] == "Goal"].index.to_series() - 1,
index=df.index,
dtype=pd.Int64Dtype(),
)
assister_index[~assister_index.isin(df.index)] = pd.NA
>>> assister_index
4 <NA>
5 4
12 <NA>
13 12
46 <NA>
207095 <NA>
207108 <NA>
207109 207108
207118 <NA>
207119 207118
dtype: Int64
df["Assister_Gender"] = assister_index.map(
lambda i: df.loc[i, "Roster_Designation"], na_action="ignore"
)
>>> df
Event_Type Roster_Designation Assister_Gender
4 Assist Male <NA>
5 Goal Female Male
12 Assist Female <NA>
13 Goal Male Female
46 Goal Male <NA>
207095 Goal Female <NA>
207108 Assist Male <NA>
207109 Goal Male Male
207118 Assist Female <NA>
207119 Goal Female Female
現在只需計算符合您所需條件的行,例如,
(
(df["Event_Type"] == "Goal")
& (df["Roster_Designation"] == "Female")
& (df["Assister_Gender"] == "Male")
).sum()
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.