[英]Conditional Iteration over a Pandas Dataframe
I am trying to loop through over a pandas data frame to meet specific conditions in an optimization task.我正在尝试遍历 pandas 数据帧以满足优化任务中的特定条件。
Let me provide some backgrounds and what I have done so far.
让我提供一些背景以及我到目前为止所做的事情。
So the table below is my sample of the top 10 rows of my input data (named df_long
) after loading and melting using pandas.所以下表是我使用 pandas 加载和熔化后输入数据的前 10 行的示例(名为
df_long
)。 I have 150 rows in my actual dataset.我的实际数据集中有 150 行。
Hour TypeofTask TaskFrequency TotalTaskatSpecific Hour
0 08 A 5 50
1 09 D 8 30
2 08 D 7 50
3 10 C 4 20
4 09 B 6 30
5 08 B 9 50
6 10 A 2 20
7 09 D 1 30
8 08 C 3 50
9 08 E 2 50
10 09 A 7 30
I have also created decision variables ie x0, x1, x2,..... xn for each row of the above input data set as above using loop statements as below;我还使用如下循环语句为上述输入数据集的每一行创建了决策变量,即 x0、x1、x2、..... xn;
decision_variables = []
for rownum, row in df_long.iterrows():
variable = str('x' + str(rownum))
variable = pulp.LpVariable(str(variable), lowBound = 0, cat= 'Integer')
decision_variables.append(variable)
My actual question..
我的实际问题..
I want to be able to loop through the pandas dataframe to find all the TaskFrequency that happened at a specific hour and then multiply each TaskFrequency by the respective decision variable for each row - the whole expression should be less than or equal to the TotalTaskatSpecificHour for a specific hour eg an expression like this for Hour 10 would be:我希望能够遍历 pandas dataframe 以找到在特定时间发生的所有TaskFrequency ,然后将每个 TaskFrequency 乘以每一行的相应决策变量 - 整个表达式应该小于或等于TotalTaskatSpecificHour特定的时间,例如第 10 小时的表达式是:
4*x3 + 2*x6 <= 20
So far I have been able to do this:
到目前为止,我已经能够做到这一点:
to = ""
for rownum, row in df_long.iterrows():
for i, wo in enumerate(decision_variables):
if rownum == i:
formula = row['TaskFrequency']*wo
to += formula
prob += to
this gave me:这给了我:
5*x0 + 8*x1 + 7*x2 + 4*x3 + 6*x4 + 9*x5 + 2*x6 + 1*x7 +3*x8 + 2*x9 + 7*x10
I also tried this:
我也试过这个:
for rownum, row in df_long.iterrows():
for i, wo in enumerate(decision_variables):
for x,y,z in zip(df_long['Hour'],df_long['TypeofTask'],df_long['TaskFrequency']):
if rownum == i:
formula1 = row['TaskFrequency']*wo
I just get 7*x10我只得到 7*x10
what I wish to get is the same expression but for a specific Hour instead of the whole thing combined eg for Hour 10 it should be,
我希望得到的是相同的表达,但是对于特定的小时而不是整个事物的组合,例如对于第 10 小时,它应该是,
4*x3 + 2*x6 <= 20
for Hour 9 it should be,
对于第 9 小时,它应该是,
8*x1 + 6*x4 + 1*x7 + 7*x10 <= 30
I look forward to your suggestions and help.我期待您的建议和帮助。
Regards问候
Diva天后
you would want a return column * (no of hours), in essence you dont need to apply function row by row, but condense the df by groupby like above answer, or slicing: I think groupby is a standard way to do it but lambda is a no brainer.你会想要一个返回列*(小时数),本质上你不需要逐行应用 function ,而是像上面的答案那样通过 groupby 压缩 df ,或者切片:我认为 groupby 是一种标准的方法,但是 lambda不费吹灰之力。
def fun1(df, Hours, prod):
return sum(df[df['Hour']==Hours].apply(lambda row:int(row.name)*row['TaskFrequency'],axis=1)) <= prod
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.