简体   繁体   English

Pandas Dataframe 的条件迭代

[英]Conditional Iteration over a Pandas Dataframe

I am trying to loop through over a pandas data frame to meet specific conditions in an optimization task.我正在尝试遍历 pandas 数据帧以满足优化任务中的特定条件。

Let me provide some backgrounds and what I have done so far.让我提供一些背景以及我到目前为止所做的事情。

So the table below is my sample of the top 10 rows of my input data (named df_long ) after loading and melting using pandas.所以下表是我使用 pandas 加载和熔化后输入数据的前 10 行的示例(名为df_long )。 I have 150 rows in my actual dataset.我的实际数据集中有 150 行。

   Hour TypeofTask TaskFrequency  TotalTaskatSpecific Hour
0   08    A             5         50
1   09    D             8         30
2   08    D             7         50
3   10    C             4         20
4   09    B             6         30
5   08    B             9         50
6   10    A             2         20
7   09    D             1         30
8   08    C             3         50
9   08    E             2         50
10  09    A             7         30

I have also created decision variables ie x0, x1, x2,..... xn for each row of the above input data set as above using loop statements as below;我还使用如下循环语句为上述输入数据集的每一行创建了决策变量,即 x0、x1、x2、..... xn;

decision_variables = []
for rownum, row in df_long.iterrows():
    variable = str('x' + str(rownum))
    variable = pulp.LpVariable(str(variable), lowBound = 0, cat= 'Integer') 
    decision_variables.append(variable)

My actual question..我的实际问题..

I want to be able to loop through the pandas dataframe to find all the TaskFrequency that happened at a specific hour and then multiply each TaskFrequency by the respective decision variable for each row - the whole expression should be less than or equal to the TotalTaskatSpecificHour for a specific hour eg an expression like this for Hour 10 would be:我希望能够遍历 pandas dataframe 以找到在特定时间发生的所有TaskFrequency ,然后将每个 TaskFrequency 乘以每一行的相应决策变量 - 整个表达式应该小于或等于TotalTaskatSpecificHour特定的时间,例如第 10 小时的表达式是:

4*x3 + 2*x6 <= 20

So far I have been able to do this:到目前为止,我已经能够做到这一点:

to = ""
for rownum, row in df_long.iterrows():
    for i, wo in enumerate(decision_variables):
            if rownum == i:
                formula = row['TaskFrequency']*wo
    to += formula
prob += to

this gave me:这给了我:

5*x0 + 8*x1 + 7*x2 + 4*x3 + 6*x4 + 9*x5 + 2*x6 + 1*x7 +3*x8 + 2*x9 + 7*x10

I also tried this:我也试过这个:

for rownum, row in df_long.iterrows():
            for i, wo in enumerate(decision_variables):
                 for x,y,z in zip(df_long['Hour'],df_long['TypeofTask'],df_long['TaskFrequency']):
                           if rownum == i:
                                formula1 = row['TaskFrequency']*wo 

I just get 7*x10我只得到 7*x10

what I wish to get is the same expression but for a specific Hour instead of the whole thing combined eg for Hour 10 it should be,我希望得到的是相同的表达,但是对于特定的小时而不是整个事物的组合,例如对于第 10 小时,它应该是,

4*x3 + 2*x6 <= 20

for Hour 9 it should be,对于第 9 小时,它应该是,

8*x1 + 6*x4 + 1*x7 + 7*x10 <= 30

I look forward to your suggestions and help.我期待您的建议和帮助。

Regards问候

Diva天后

you would want a return column * (no of hours), in essence you dont need to apply function row by row, but condense the df by groupby like above answer, or slicing: I think groupby is a standard way to do it but lambda is a no brainer.你会想要一个返回列*(小时数),本质上你不需要逐行应用 function ,而是像上面的答案那样通过 groupby 压缩 df ,或者切片:我认为 groupby 是一种标准的方法,但是 lambda不费吹灰之力。

def fun1(df, Hours, prod):
   return sum(df[df['Hour']==Hours].apply(lambda row:int(row.name)*row['TaskFrequency'],axis=1)) <= prod 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM