简体   繁体   English

如何在 Python 中进行条件计算

[英]How to do conditional calculations in Python

I have a sample data that looks like this.我有一个看起来像这样的示例数据。

Column DateDuration was calculated in Excel, following below logic:列 DateDuration 在 Excel 中计算,遵循以下逻辑:

  • DateDuration between SecondDate and FirstDate >= 28, then DateDuration = SecondDate - FirstDate. SecondDate 和 FirstDate 之间的 DateDuration >= 28,然后 DateDuration = SecondDate - FirstDate。
  • DateDuration between SecondDate and FirstDate <28, if ThirdDate = nan, then DateDuration = SecondDate - FirstDate.在 SecondDate 和 FirstDate 之间的 DateDuration <28,如果 ThirdDate = nan,则 DateDuration = SecondDate - FirstDate。
  • DateDuration between SecondDate and FirstDate <28, if ThirdDate = not nan, then consider (ThirdDate - FirstDate): SecondDate 和 FirstDate 之间的 DateDuration <28,如果 ThirdDate = not nan,则考虑 (ThirdDate - FirstDate):
    • ThirdDate - FirstDate >= 28, then DateDuration = ThirdDate - FirstDate. ThirdDate - FirstDate >= 28,然后 DateDuration = ThirdDate - FirstDate。
    • ThirdDate - FirstDate < 28, if FourthDate = nan, then DateDuration = ThirdDate - FirstDate. ThirdDate - FirstDate < 28,如果 FourthDate = nan,则 DateDuration = ThirdDate - FirstDate。
    • ThirdDate - FirstDate < 28, if FourthDate = not nan, then DateDuration = FourthDate - FirstDate. ThirdDate - FirstDate < 28,如果 FourthDate = 不是 nan,则 DateDuration = FourthDate - FirstDate。

I would like to calculate DateDuration in Python but do not know how to do about this.我想用 Python 计算 DateDuration 但不知道该怎么做。

Types of data in Python: Python中的数据类型:

  • ID int64标识 int64
  • FirstDate object FirstDate 对象
  • SecondDate object SecondDate 对象
  • ThirdDate object第三日期对象
  • FourthDate object第四日期对象

I am new to Python.我是 Python 新手。 Any help would be greatly appreciated!!任何帮助将不胜感激!!

import pandas as pd
import numpy as np
df['FirstDate'] = pd.to_datetime(df['FirstDate'])
df['SecondDate'] = pd.to_datetime(df['SecondDate'])
df['DayDifference2'] = (df['SecondDate']) -(df['FirstDate'])
df['DayDifference3'] = (df['ThirdDate']) -(df['FirstDate'])
df['DayDifference4'] = (df['FourthDate']) -(df['FirstDate'])
    
x = df['DayDifference2'].dt.days
y = df['DayDifference3'].dt.days
z = df['DayDifference4'].dt.days

condlist = [x<28, x>=28]
choicelist = [(df['ThirdDate']) -(df['FirstDate']), (df['SecondDate']) -(df['FirstDate'])]
np.select(condlist, choicelist)

My data:我的数据:

ID ID FirstDate第一次约会 SecondDate第二日期 ThirdDate第三天 FourthDate第四天 DateDuration日期持续时间
2914300 2914300 2021-09-23 2021-09-23 2021-10-07 2021-10-07 2021-11-29 2021-11-29 2021-12-20 2021-12-20 67 67
3893461 3893461 2021-09-08 2021-09-08 2021-10-06 2021-10-06 2022-04-07 2022-04-07 211 211
4343075 4343075 2021-06-23 2021-06-23 2021-09-27 2021-09-27 96 96
4347772 4347772 2021-06-23 2021-06-23 2021-09-27 2021-09-27 96 96
4551963 4551963 2021-08-02 2021-08-02 2021-10-14 2021-10-14 2022-03-11 2022-03-11 73 73
4893324 4893324 2021-09-30 2021-09-30 2021-10-01 2021-10-01 2022-03-03 2022-03-03 2022-03-10 2022-03-10 154 154
5239991 5239991 2021-06-24 2021-06-24 2021-08-26 2021-08-26 2021-09-25 2021-09-25 2022-02-03 2022-02-03 63 63
8454947 8454947 2021-09-28 2021-09-28 2021-10-05 2021-10-05 7 7
8581390 8581390 2021-09-27 2021-09-27 2022-03-21 2022-03-21 2022-03-25 2022-03-25 175 175
8763766 8763766 2021-09-20 2021-09-20 2021-10-04 2021-10-04 2021-12-09 2021-12-09 80 80
9144185 9144185 2021-06-18 2021-06-18 2021-06-23 2021-06-23 5 5
9967685 9967685 2021-09-13 2021-09-13 2021-10-29 2021-10-29 2022-02-07 2022-02-07 2022-03-23 2022-03-23 46 46
11367560 11367560 2021-08-31 2021-08-31 2021-09-28 2021-09-28 2021-10-21 2021-10-21 2022-02-11 2022-02-11 51 51

Refer to the time module built-in.参考内置的时间模块。 It allows for more time class types that I actually used for my own workout routine maker.它允许我实际用于我自己的锻炼程序制造商的更多时间课程类型。

import datetime as dt

# particularly the types:
dt.timedelta(1)
# and
dt.time(minute=0, second=0)
# there are also date classes you can use. 

Documentation: https://docs.python.org/3/library/datetime.html文档: https ://docs.python.org/3/library/datetime.html

import pandas as pd
import numpy as np

df = pd.read_csv('date_example.csv')
df.loc[:,'FirstDate':'FourthDate'] = df.loc[:,'FirstDate':'FourthDate'].astype('datetime64[ns]')
df

在此处输入图像描述 NaT is a missing value of datetime64[ns] type NaTdatetime64[ns]类型的缺失值




Conditions & Choices条件和选择

conditions = [
    (df['SecondDate'] - df['FirstDate']).dt.days >= 28,
    ((df['SecondDate'] - df['FirstDate']).dt.days < 28) & df['ThirdDate'].isna(),
    ((df['SecondDate'] - df['FirstDate']).dt.days < 28) & df['ThirdDate'].notna() & ((df['ThirdDate'] - df['FirstDate']).dt.days >= 28),
    ((df['SecondDate'] - df['FirstDate']).dt.days < 28) & df['ThirdDate'].notna() & ((df['ThirdDate'] - df['FirstDate']).dt.days < 28) & df['FourthDate'].isna(),
    ((df['SecondDate'] - df['FirstDate']).dt.days < 28) & df['ThirdDate'].notna() & ((df['ThirdDate'] - df['FirstDate']).dt.days < 28) & df['FourthDate'].notna()
]

choices = [
    (df['SecondDate'] - df['FirstDate']).dt.days,
    (df['SecondDate'] - df['FirstDate']).dt.days,
    (df['ThirdDate'] - df['FirstDate']).dt.days,
    (df['ThirdDate'] - df['FirstDate']).dt.days,
    (df['FourthDate'] - df['FirstDate']).dt.days
]



df['Duration'] = np.select(conditions, choices)
df

Result结果在此处输入图像描述



Discussion: There are some differences, eg, second row, ID = 3893461 , according to your conditions(DateDuration between SecondDate and FirstDate >= 28, then DateDuration = SecondDate - FirstDate.), SecondDate - FirstDate of ID = 3893461 is 28 , same thing happened on last row, ID = 11367560讨论:有一些差异,例如,第二行, ID = 3893461 ,根据您的条件(SecondDate 和 FirstDate 之间的 DateDuration >= 28,然后 DateDuration = SecondDate - FirstDate.), SecondDate - ID = 3893461 FirstDate FirstDate 是28 ,相同最后一行发生的事情, ID = 11367560

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM