简体   繁体   中英

Python: Group two columns together and find sum of third column

Really new to python and need a bit of help with a question I have to complete.

I need to find the average earned per time period (month/year), based on a user input for month (MM) and year (YYYY).

I have the inputs as follows:

year_value = int(input("Year (YYYY): "))
month_value = int(input("Month (MM): "))

My dataframe looks like this:

Race_Course   Horse Name      Year   Month   Day   Amount_won_lost   Won/Lost
Aintree       Red Rum         2017   5       12    11.58             won
Punchestown   Camelot         2016   12      22    122.52            won
Sandown       Beef of Salmon  2016   11      17    20.0              lost
Ayr           Corbiere        2016   11      3     25.0              lost
Fairyhouse    Red Rum         2016   12      2     65.75             won
Ayr           Camelot         2017   3       11    12.05             won
Aintree       Hurricane Fly   2017   5       12    11.58             won
Punchestown   Beef or Salmon  2016   12      22    112.52            won
Sandown       Aldaniti        2016   11      17    10.0              lost
etc.

I have two problems:

  1. how do I group the data together based on the inputs and sum the Amount_won_lost values that match, and
  2. how do I make sure that when summing the values together that the value in Amount_won_lost is negative when Won/Lost = lost and stays positive for when Won/Lost = won

Any help would be very much appreciated! I've been stuck on this for a few hours and can't seem to figure it out.

The output should look something like this, but anything that prints the result would be perfect, I don't mind how it looks:

Year    Month    Amount_won_lost
2016    11       €-55.00

please try this

by_year = race_data[race_data['Year']==year_value ]
by_month = by_year[by_year['Month']==year_value ]
print(by_month['Amount_won_lost'].sum())

i hope it helps

ps : data is a pandas DataFrame

You can first change the signs of your Amount_won_lost attribute by using pd.DataFrame.apply()

So for the following line:

df["Amount_won_lost"] = df.apply(lambda x: -x["Amount_won_lost"] \
    if x["Won/Lost"] == "lost" else x["Amount_won_lost"], axis = 1)

It will replace your Amount_won_lost column with either a positive or negative value depending on if they won or lost.

And by utilizing sum() function mentioned in the other comments, you can get the sum for the Amount_won_lost for that given year and month.

The following will select all the values at your inputted values:

df[(df["Year"] == year_value) & (df["Month"] == month_value)]

The output would be this:

  Race_Course       HorseName  Year  Month  Day  Amount_won_lost Won/Lost
2     Sandown  Beef of Salmon  2016     11   17            -20.0     lost
3         Ayr        Corbiere  2016     11    3            -25.0     lost
8     Sandown        Aldaniti  2016     11   17            -10.0     lost

print(df[(df["Year"] == year_value) & (df["Month"] == month_value)]["Amount_won_lost"].sum())

will print -55.0 .

If you had wanted to see the sums for every given month in a year, without using user inputs, the groupby function is your best bet!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM