Suppose I have a pandas data frame like:
Date Type Rate Load
0 2017-01-02 Rain 23 10
1 2017-01-02 Dry 30 15
2 2017-01-02 Rain 32 20
....
I also have a cost function cost(Type, Rate)
returning some real number.
How can I create a new column that for each row calculates the sum of Load
all other rows having the same Date
and having less cost()
of that given row.
For example, if the cost function is simply:
def cost(Type, Rate):
if Type=='Rain':
return Rate/12
else:
return Rate/17
The output will be:
Date Type Rate Load Output
0 2017-01-02 Rain 23 10 15
1 2017-01-02 Dry 30 15 0
2 2017-01-02 Rain 32 20 15+10=25
....
Update. The current way that I'm thinking about is to create a new column that calculates the cost
of each row first, and at the next step, create a new column that sums up the all the records for each row that has the same date, and having the lesser cost. But is there any faster way to combine both of these?
row_sum = df.groupby(["Date"]).sum()
costs = [row_sum[row_sum["Date"] == i["Date"]] - cost(i["Type"], i["Rate"]) for i in df.iterrows()])
df["Output"] = costs
You could try this with df.to_records()
:
print(df)
cost= lambda Type, Rate: Rate/12 if Type=='Rain' else Rate/17
l=[sum([j[4] for j in df.to_records() if list(j)[1]==list(i)[1] and list(i)!=list(j) and cost(list(j)[2],list(j)[3])<cost(list(i)[2],list(i)[3])]) for i in df.to_records()]
df['Output']=l
print(df)
Output:
df:
Date Type Rate Load
0 2017-01-01 Rain 23 10
1 2017-01-01 Dry 22 10
2 2017-01-01 Rain 25 10
3 2017-01-02 Dry 30 15
4 2017-01-02 Rain 32 20
df with output column:
Date Type Rate Load Output
0 2017-01-01 Rain 23 10 10
1 2017-01-01 Dry 22 10 0
2 2017-01-01 Rain 25 10 20
3 2017-01-02 Dry 30 15 0
4 2017-01-02 Rain 32 20 15
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.