简体   繁体   English

基于其他 3 个列表的新列表

[英]New list based on 3 other lists

Starting with a CSV file with the columns ['race_number', 'number_of_horses_bet_on','odds']从包含['race_number', 'number_of_horses_bet_on','odds']列的 CSV 文件开始
I would like to add/calculate an extra column called 'desired_output' .我想添加/计算一个名为'desired_output'的额外列。

The 'desired_output' column is computed by 'desired_output'列由下式计算

  • for 'race_number' 1, the 'number_of_horses_bet_on'=2, therefore in the 'desired_output column' , only the first 2 'odds' are included.对于 'race_number' 1,'number_of_horses_bet_on'=2,因此在'desired_output column'中,仅包含前 2 个'odds' The remaining values for 'race_number' 1 are 0. Then we go to 'race_number' 2 and the cycle repeats. 'race_number' 1 的剩余值为 0。然后我们将 go 转换为'race_number' 2 并重复循环。

在此处输入图像描述

Code I have tried includes:我尝试过的代码包括:

import pandas as pd

df=pd.read_csv('test.csv')

desired_output=[]
count=0
for i in df.number_of_horses_bet_on:
    for j in df.odds:
        if count<i:
            desired_output.append(j)
            count+=1
        else:
            desired_output.append(0)
print(desired_output)

and also并且

df['desired_output']=df.odds.apply(lambda x: x if count<number_of_horses_bet_on else 0)

Neither of these give the output of the column 'desired_output'这些都没有给出'desired_output'列的output

I realise the 'count' in the lambda above is misplaced - but hopefully you can see what I am after.我意识到上面 lambda 中的“计数”放错了位置 - 但希望你能看到我所追求的。 Thanks.谢谢。

I'm gonna do it a bit differently, this will be what I'm gonna do我会做的有点不同,这就是我要做的

  • get a list of all race_number获取所有race_number的列表
  • for each race_number , extract the number_of_horses_bet_on对于每个race_number ,提取number_of_horses_bet_on
  • create a list that contains 1 or 0, where we would have number_of_horses_bet_on number of 1s and the rest would be zero.创建一个包含 1 或 0 的列表,其中我们将有number_of_horses_bet_on个 1,并且 rest 将为零。
  • multiple this list with the odds column将此列表与odds列相乘
import pandas as pd

df=pd.read_csv('test.csv')

mask = []
races = df['race_number'].unique().tolist() # unique list of all races
for race in races:
    # filter the dataframe by the race number
    df_race = df[df['race_number'] == race]
    # assuming number of horses is unique for every race, we extract it here
    number_of_horses = df_race['number_of_horses_bet_on'].iloc[0]
    # this mask will contain a list of 1s and 0s, for example for race 1 it'll be [1,1,0,0,0]
    mask = mask + [1] * number_of_horses + [0] * (len(df_race) - number_of_horses)

df['mask'] = mask
df['desired_output'] = df['mask'] * df['odds']
del df['mask']

print(df)

This assumes that for each race the numbers_of_horses_bet_on equals or less than the number of rows for that race, otherwise you might need to use min/max to get proper results这假设对于每场比赛,numbers_of_horses_bet_on 等于或小于该比赛的行数,否则您可能需要使用 min/max 来获得正确的结果

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM