New list based on 3 other lists

Question

Starting with a CSV file with the columns ['race_number', 'number_of_horses_bet_on','odds']
I would like to add/calculate an extra column called 'desired_output' .

The 'desired_output' column is computed by

for 'race_number' 1, the 'number_of_horses_bet_on'=2, therefore in the 'desired_output column' , only the first 2 'odds' are included. The remaining values for 'race_number' 1 are 0. Then we go to 'race_number' 2 and the cycle repeats.

Code I have tried includes:

import pandas as pd

df=pd.read_csv('test.csv')

desired_output=[]
count=0
for i in df.number_of_horses_bet_on:
    for j in df.odds:
        if count<i:
            desired_output.append(j)
            count+=1
        else:
            desired_output.append(0)
print(desired_output)

and also

df['desired_output']=df.odds.apply(lambda x: x if count<number_of_horses_bet_on else 0)

Neither of these give the output of the column 'desired_output'

I realise the 'count' in the lambda above is misplaced - but hopefully you can see what I am after. Thanks.

Answer 1

I'm gonna do it a bit differently, this will be what I'm gonna do

get a list of all race_number
for each race_number , extract the number_of_horses_bet_on
create a list that contains 1 or 0, where we would have number_of_horses_bet_on number of 1s and the rest would be zero.
multiple this list with the odds column

import pandas as pd

df=pd.read_csv('test.csv')

mask = []
races = df['race_number'].unique().tolist() # unique list of all races
for race in races:
    # filter the dataframe by the race number
    df_race = df[df['race_number'] == race]
    # assuming number of horses is unique for every race, we extract it here
    number_of_horses = df_race['number_of_horses_bet_on'].iloc[0]
    # this mask will contain a list of 1s and 0s, for example for race 1 it'll be [1,1,0,0,0]
    mask = mask + [1] * number_of_horses + [0] * (len(df_race) - number_of_horses)

df['mask'] = mask
df['desired_output'] = df['mask'] * df['odds']
del df['mask']

print(df)

This assumes that for each race the numbers_of_horses_bet_on equals or less than the number of rows for that race, otherwise you might need to use min/max to get proper results

New list based on 3 other lists

Question

1 answers

solution1
1 ACCPTED 2020-08-10 21:50:08

New list based on 3 other lists

Question

1 answers

solution1 1 ACCPTED 2020-08-10 21:50:08

solution1
1 ACCPTED 2020-08-10 21:50:08