简体   繁体   English

基于多个条件语句创建新列 pandas dataframe

[英]creating new column based on multiple conditional statements pandas dataframe

I have a machine dataset with the below details.我有一个包含以下详细信息的机器数据集。

Sample df:样本df:

在此处输入图像描述

Need to create a new column called "Quality Match" , and that column indicates whether the current shift Planned Quality is the same as the actual Quality .需要新建一个名为“Quality Match”的列,该列表示当前班次Planned Quality是否与实际Quality相同

Below are the conditions.以下是条件。

1.)First of all, need to check the planned Quality is the same as the Actual Quality , if yes>>>Update "Quality Match" as 0. 1.)首先,需要检查计划的质量是否与实际质量相同,如果是>>>更新“质量匹配”为0。

2.) 2.1 If they are different from each other, need to check previous shift's actual quality is the same as the current actual quality and 2.2 if not need to check Planned Quality column and where are previous shift's Actual quality lastly located and get the all unique qualities after that to the current cell and check whether the current actual quality contains in that selected qualities. 2.) 2.1 如果不同,需要检查上一班的实际质量与当前实际质量是否相同; 2.2如果不需要检查计划质量栏,上一班的实际质量最后定位在哪里并得到所有之后对当前单元格的唯一品质,并检查当前实际品质是否包含在所选品质中。

if any of the 2.1 or 2.2 conditions satisfied>>>Update "Quality Match" as -1如果满足 2.1 或 2.2 条件中的任何一个>>>将“质量匹配”更新为 -1

3.) Otherwise, update "Quality Match" as 1 3.) 否则,将“质量匹配”更新为 1

Ex: Please check cell 177, this shift's planned quality(Quality A) and Actual quality(Quality B) are different, then check the previous shift's Actual Quality(Quality C) its also not the current Actual quality(B), and then need to check Whether Before the current shifts' Planned Quality include Previous Shift Actual Quality(C), yes it is lastly situated at 166.then get the all the unique qualities till to the current cell(167 to 176), check that quality list contains current quality(Quality B), yes it is then updated "Quality Match" as -1.例如:请检查单元格 177,本班的计划质量(质量 A)和实际质量(质量 B)不同,然后检查上一班的实际质量(质量 C)也不是当前的实际质量(B),然后需要检查当前班次的计划质量是否包括上一班实际质量(C),是的,它最后位于 166。然后获取所有独特的质量直到当前单元格(167 到 176),检查质量列表是否包含当前质量(质量 B),是的,然后将“质量匹配”更新为-1。

Final Expected Output:最终预期 Output:

在此处输入图像描述

sample dataset:样本数据集:

# import pandas library
import pandas as pd
from pandas import Timestamp
# dictionary with list object in values
details ={'Machine': {164: 'M22',
  165: 'M22',
  166: 'M22',
  167: 'M22',
  168: 'M22',
  169: 'M22',
  170: 'M22',
  171: 'M22',
  172: 'M22',
  173: 'M22',
  174: 'M22',
  175: 'M22',
  176: 'M22',
  177: 'M22',
  178: 'M22',
  179: 'M22'},
 'Start Time': {164: Timestamp('2021-05-31 07:00:00'),
  165: Timestamp('2021-05-31 08:11:12'),
  166: Timestamp('2021-05-31 08:46:12'),
  167: Timestamp('2021-05-31 12:00:00'),
  168: Timestamp('2021-05-31 19:00:00'),
  169: Timestamp('2021-06-01 07:00:00'),
  170: Timestamp('2021-06-01 19:00:00'),
  171: Timestamp('2021-06-02 07:00:00'),
  172: Timestamp('2021-06-02 19:00:00'),
  173: Timestamp('2021-06-02 19:00:00'),
  174: Timestamp('2021-06-03 07:00:00'),
  175: Timestamp('2021-06-03 19:00:00'),
  176: Timestamp('2021-06-04 07:00:00'),
  177: Timestamp('2021-06-04 14:38:42'),
  178: Timestamp('2021-06-04 14:39:27'),
  179: Timestamp('2021-06-04 19:00:00')},
 'End Time': {164: Timestamp('2021-05-31 08:11:12'),
  165: Timestamp('2021-05-31 08:46:12'),
  166: Timestamp('2021-05-31 12:00:00'),
  167: Timestamp('2021-05-31 19:00:00'),
  168: Timestamp('2021-06-01 07:00:00'),
  169: Timestamp('2021-06-01 19:00:00'),
  170: Timestamp('2021-06-02 07:00:00'),
  171: Timestamp('2021-06-02 19:00:00'),
  172: Timestamp('2021-06-02 19:00:00'),
  173: Timestamp('2021-06-03 07:00:00'),
  174: Timestamp('2021-06-03 19:00:00'),
  175: Timestamp('2021-06-04 07:00:00'),
  176: Timestamp('2021-06-04 14:38:42'),
  177: Timestamp('2021-06-04 14:39:27'),
  178: Timestamp('2021-06-04 19:00:00'),
  179: Timestamp('2021-06-05 07:00:00')},
 'shift': {164: 'Day',
  165: 'Day',
  166: 'Day',
  167: 'Day',
  168: 'Night',
  169: 'Day',
  170: 'Night',
  171: 'Day',
  172: 'Night',
  173: 'Night',
  174: 'Day',
  175: 'Night',
  176: 'Day',
  177: 'Day',
  178: 'Day',
  179: 'Night'},
 'Planned Quality': {164: 'C',
  165: 'C',
  166: 'C',
  167: 'B',
  168: 'B',
  169: 'B',
  170: 'B',
  171: 'B',
  172: 'B',
  173: 'A',
  174: 'A',
  175: 'A',
  176: 'A',
  177: 'A',
  178: 'A',
  179: 'A'},
 'Actual Quality': {164: 'D',
  165: 'DEFAULT',
  166: 'C',
  167: 'C',
  168: 'C',
  169: 'C',
  170: 'C',
  171: 'C',
  172: 'C',
  173: 'C',
  174: 'C',
  175: 'C',
  176: 'C',
  177: 'B',
  178: 'A',
  179: 'A'},
 'Planned Shift Production': {164: 75.87,
  165: 317.29,
  166: 206.51,
  167: 54.88,
  168: 258.5,
  169: 658.5,
  170: 658.5,
  171: 658.5,
  172: 743.13,
  173: 329.25,
  174: 658.5,
  175: 658.5,
  176: 419.52,
  177: 0.69,
  178: 238.29,
  179: 658.5},
 'Actual Shift Production': {164: 4.16,
  165: 0.0,
  166: 158.81,
  167: 173.13,
  168: 596.4,
  169: 805.03,
  170: 107.26,
  171: 0.0,
  172: 0.0,
  173: 0.0,
  174: 0.0,
  175: 122.78,
  176: 3323.42,
  177: 0.0,
  178: 2284.28,
  179: 686.7}}        



  
# creating a Dataframe object 
df = pd.DataFrame(details)
  
df

My approach:我的做法:

I tried to create a Quality Match column using np.select() but couldn't able to set the 2.2 conditions into my code.我尝试使用 np.select() 创建质量匹配列,但无法将 2.2 条件设置到我的代码中。

Really appreciate your support !!!!!!!!!!!!真的很感谢你的支持!!!!!!!!!!!!

There may be more elegant solutions, but the following straightforward approach should do what you want:可能有更优雅的解决方案,但以下简单的方法应该可以满足您的需求:

machine_list = df["Machine"].unique().tolist()

for machine in machine_list:
    indices = df.index[df["Machine"]==machine].tolist()
    start_index = indices[0]
    end_index = indices[-1]

    for i, (planned, actual) in enumerate(zip(df.loc[start_index:,"Planned Quality"], df.loc[start_index:,"Actual Quality"]), start=start_index):
        if i > end_index:
            break
        if planned == actual:
            df.at[i, "Quality Match"] = 0
        elif i >= start_index + 1:
            if actual == df.at[i-1, "Actual Quality"]:
                df.at[i, "Quality Match"] = -1
            elif i-2  >= start_index:
                j = i-2
                lst = []
                while j >= start_index:
                    if df.at[j, "Planned Quality"] == df.at[i-1, "Actual Quality"]:
                        lst = [x for x in df.loc[j:i-1,"Planned Quality"]]
                        break
                    else:
                        j -= 1

                if actual in lst:
                    df.at[i, "Quality Match"] = -1
                else:
                    df.at[i, "Quality Match"] = 1
            else:
                df.at[i, "Quality Match"] = 1
        else:
            df.at[i, "Quality Match"] = 1

Note that, in my suggestion, I have assumed that your dataset is sorted by machine names.请注意,在我的建议中,我假设您的数据集按机器名称排序。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM