简体   繁体   English

如何在字段更改历史数据的同时计算给定值中字段的持续时间?

[英]How to count the duration of a field in a given value while having the field change history data?

I'm working with field change history data which has timestamps for when the field value was changed.我正在使用字段更改历史数据,其中包含更改字段值的时间戳。 In this example, I need to calculate the overall case duration in 'Termination in Progress' status.在此示例中,我需要计算处于“正在终止”状态的总体案例持续时间。

The given case was changed from and to this status three times in total: see screenshot给定的案例总共从这个状态改变了三次:见截图

I need to add up all three durations in this case and in other cases it can be more or less than three.在这种情况下,我需要将所有三个持续时间相加,而在其他情况下,它可以多于或少于三个。

Does anyone know how to calculate that in Python?有谁知道如何在 Python 中计算它?

Welcome to Stack Overflow!欢迎来到堆栈溢出!

Based on the limited data you provided, here is a solution that should work although the code makes some assumptions that could cause errors so you will want to modify it to suit your needs.根据您提供的有限数据,尽管代码做出了一些可能导致错误的假设,因此您需要对其进行修改以满足您的需求,但这里是一个应该可以工作的解决方案。 I avoided using list comprehension or array math to make it more clear since you said you're new to Python.我避免使用列表理解或数组数学来使其更清晰,因为您说您是 Python 新手。

Assumptions:假设:

  • You're pulling this data into a pandas dataframe您正在将此数据提取到 pandas 数据框中
  • All Old values of "Termination in Progress" have a matching new value for all Case Numbers “正在终止”的所有旧值对所有案例编号都有一个匹配的新值
import datetime
import pandas as pd
import numpy as np


fp = r'<PATH TO FILE>\\'
f = '<FILENAME>.csv'

data = pd.read_csv(fp+f)
#convert ts to datetime for later use doing time delta calculations
data['Edit Date'] = pd.to_datetime(data['Edit Date'])
# sort by the same case number and date in opposing order to make sure values for old and new align properly
data.sort_values(by = ['CaseNumber','Edit Date'], ascending = [True,False],inplace = True)

#find timestamps where Termination in progress occurs
old_val_ts = data.loc[data['Old Value'] == 'Termination in progress']['Edit Date'].to_list()
new_val_ts = data.loc[data['New Value'] == 'Termination in progress']['Edit Date'].to_list()

#Loop over the timestamps and calc the time delta
ts_deltas = list()
for i in range(len(old_val_ts)):
    item = old_val_ts[i] - new_val_ts[i]
    ts_deltas.append(item)

# this loop could also be accomplished with list comprehension like this:
#ts_deltas = [old_ts - new_ts for (old_ts, new_ts) in zip(old_val_ts, new_val_ts)]

print('Deltas between groups')
print(ts_deltas)
print()

#Sum the time deltas
total_ts_delta = sum(ts_deltas,datetime.timedelta())
print('Total Time Delta')
print(total_ts_delta)
Deltas between groups
[Timedelta('0 days 00:08:00'), Timedelta('0 days 00:06:00'), Timedelta('0 days 02:08:00')]

Total Time Delta
0 days 02:22:00

I've also attached a picture of the solution minus my file path for obvious reasons.由于显而易见的原因,我还附上了解决方案的图片减去我的文件路径。 Hope this helps.希望这可以帮助。 Please remember to mark as correct if this solution works for you.如果此解决方案适合您,请记住标记为正确。 Otherwise let me know what issues you run into.否则让我知道你遇到了什么问题。

基于测试数据的代码

EDIT:编辑:

If you have multiple case numbers you want to look at, you could do it in various ways, but the simplest would be to just get a list of unique case numbers with data['CaseNumber'].unique() then iterate over that array filtering for each case number and appending the total time delta to a new list or a dictionary (not necessarily the most efficient solution, but it will work).如果您有多个要查看的案例编号,您可以通过各种方式进行操作,但最简单的方法是使用data['CaseNumber'].unique()获取唯一案例编号列表,然后遍历该数组过滤每个案例编号并将总时间增量附加到新列表或字典(不一定是最有效的解决方案,但它会起作用)。

cases_total_td = {}

unique_cases = data['CaseNumber'].unique()
for case in unique_cases:
    temp_data = data[data['CaseNumber'] == case]
    
    #find timestamps where Termination in progress occurs
    old_val_ts = data.loc[data['Old Value'] == 'Termination in progress']['Edit Date'].to_list()
    new_val_ts = data.loc[data['New Value'] == 'Termination in progress']['Edit Date'].to_list()

    #Loop over the timestamps and calc the time delta
    ts_deltas = list()
    for i in range(len(old_val_ts)):
        item = old_val_ts[i] - new_val_ts[i]
        ts_deltas.append(item)

    ts_deltas = [old_ts - new_ts for (old_ts, new_ts) in zip(old_val_ts, new_val_ts)]

    #Sum the time deltas
    total_ts_delta = sum(ts_deltas,datetime.timedelta())

    
    cases_total_td[case] = total_ts_delta

print(cases_total_td)
{1005222: Timedelta('0 days 02:22:00')}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM