簡體   English   中英

如何根據 dataframe 中的條件計算字符串值的出現次數?

[英]How do I count the occurrence of string values based on a condition in a dataframe?

我有以下 dataframe:

        Event_Type Roster_Designation
4          Assist               Male
5            Goal             Female
12         Assist             Female
13           Goal               Male
46           Goal               Male
...           ...                ...
207095       Goal             Female
207108     Assist               Male
207109       Goal               Male
207118     Assist             Female
207119       Goal             Female

我想知道在男性的協助下女性進了多少球? 有多少女性進球是由女性助攻的? 然后反之亦然(例如,女性助攻 -> 男性進球,男性助攻 - 男性進球)。

import pandas as pd

# test data. expected result is 1 for each category
data = [['Assist', 'Male'], ['Goal', 'Female'], ['Assist', 'Female'], ['Goal', 'Male'], ['Goal', 'Male'], ['Goal', 'Female'], ['Assist', 'Male'], ['Goal', 'Male'], ['Assist', 'Female'], ['Goal', 'Female']]
df = pd.DataFrame(data, columns=['Event_Type', 'Roster_Designation'])

female_female = len(df[(df['Event_Type'] == 'Goal') & (df['Roster_Designation'] == 'Female') & (df['Roster_Designation'].shift() == 'Female') & (df['Event_Type'].shift() == 'Assist')])
female_male = len(df[(df['Event_Type'] == 'Goal') & (df['Roster_Designation'] == 'Female') & (df['Roster_Designation'].shift() == 'Male') & (df['Event_Type'].shift() == 'Assist')])
male_male = len(df[(df['Event_Type'] == 'Goal') & (df['Roster_Designation'] == 'Male') & (df['Roster_Designation'].shift() == 'Male') & (df['Event_Type'].shift() == 'Assist')])
male_female = len(df[(df['Event_Type'] == 'Goal') & (df['Roster_Designation'] == 'Male') & (df['Roster_Designation'].shift() == 'Female') & (df['Event_Type'].shift() == 'Assist')])

print(f"Female goals assisted by Females: {female_female}")
print(f"Female goals assisted by Males: {female_male}")
print(f"Male goals assisted by Males: {male_male}")
print(f"Male goals assisted by Females: {male_female}")

Output:
Female goals assisted by Females: 1
Female goals assisted by Males: 1
Male goals assisted by Males: 1
Male goals assisted by Females: 1

var 賦值說明:

我們將所有內容包裝在 len() 中,以便我們分配一個數字作為最終值

我們正在檢查的 4 個條件(例如女性 -> 男性)是:

  • 關鍵Event_Type的當前值是否 == 目標?
  • 關鍵Roster_Designation的當前值是否 == 女性?
  • Roster_Designation == Male 的前一個值(移位)?
  • Event_Type == 輔助的先前值(移位)?

如果以上都是肯定的,那就數一數。

這個怎么樣:

assister_index = pd.Series(
    df[df["Event_Type"] == "Goal"].index.to_series() - 1,
    index=df.index,
    dtype=pd.Int64Dtype(),
)
assister_index[~assister_index.isin(df.index)] = pd.NA

>>> assister_index
4           <NA>
5              4
12          <NA>
13            12
46          <NA>
207095      <NA>
207108      <NA>
207109    207108
207118      <NA>
207119    207118
dtype: Int64

df["Assister_Gender"] = assister_index.map(
    lambda i: df.loc[i, "Roster_Designation"], na_action="ignore"
) 
>>> df
       Event_Type Roster_Designation Assister_Gender
4          Assist               Male            <NA>
5            Goal             Female            Male
12         Assist             Female            <NA>
13           Goal               Male          Female
46           Goal               Male            <NA>
207095       Goal             Female            <NA>
207108     Assist               Male            <NA>
207109       Goal               Male            Male
207118     Assist             Female            <NA>
207119       Goal             Female          Female    

現在只需計算符合您所需條件的行,例如,

(
    (df["Event_Type"] == "Goal")
    & (df["Roster_Designation"] == "Female")
    & (df["Assister_Gender"] == "Male")
).sum()

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM