简体   繁体   中英

How to get frequency count for a column value, sorted by aa categorical value in another column

I have a pandas dataframe that includes two columns, vessel name and delay indicator. Vessel name is a string name of a vessel, and delay indicator is either a 0 or 1 (boolean).

My DataFrame:

df = pd.DataFrame({
    "Vessel.Name": ["Spirit of British Columbia", "Queen of New Westminster", "Spirit of Vancouver Island", "Coastal Celebration", "Spirit of British Columbia"],
    "Delay.Indicator":[0, 0, 0, 1, 0]
})

How it looks:

Vessel.Name                 Delay.Indicator
Spirit of British Columbia  0
Queen of New Westminster    0
Spirit of Vancouver Island  0
Coastal Celebration         1
Spirit of British Columbia  0 

My goal is to get a DataFrame that includes each different ship name, and two new columns indicating its count, and its total number of "1" in delay indicator, for each different ship name. Not sure if there are pandas methods for this or if I should iterate through python lists?

A simple groupby with aggregate functions applied should do the trick:

df.groupby("Vessel.Name")["Delay.Indicator"].agg(['count', sum])

Output:

                            count   sum
Vessel.Name     
Coastal Celebration         1       1
Queen of New Westminster    1       0
Spirit of British Columbia  2       0
Spirit of Vancouver Island  1       0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM