简体   繁体   English

如何在 python 中将总天数 output 除以个别年份,这样总天数不会影响某一特定年份

[英]How to divide total days output by individual year in python so that total days doesn't effect one particular year

Iam currently working on analyzing the delay in the response to a request made to each department.我目前正在分析对每个部门的请求的响应延迟。 Format of data is as below:数据格式如下:

Department     RequestDate     ResponseDate 
Electronics    2019-05-01      2019-09-19
Babyshop       2018-08-02      2019-09-30
Grocery        2016-01-01      2018-01-01
Pharmacy       2015-03-01      2018-03-01

What iam trying to accomplish is divide total days into respective years.我试图完成的是将总天数划分为相应的年份。 The expected output is as below:预期的 output 如下:

Department     RequestDate     ResponseDate   2015  2016  2017  2018  2019    TotalDays
Electronics    2019-05-01      2019-09-19      0      0    0     0     149     149
Babyshop       2018-08-02      2019-09-30      0      0    0     152   272     424
Grocery        2016-01-01      2018-01-01      0      365  365   1     0       731
Pharmacy       2015-03-01      2018-03-01      306    365  365   60    0       1096

currently my workflow is in excel and it is tidious.目前我的工作流程在 excel 中,而且很整洁。 Is there any way to make use of python functions.有没有办法利用 python 功能。

I have tried my best to include every boundary condition in the solution.我已尽力在解决方案中包含每个边界条件。 As far as index are concerned I think you can take care of that.就索引而言,我认为您可以解决这个问题。

import calendar as cd
df = pd.DataFrame(columns=['RequestDate','ResponseDate'])
df.RequestDate = [pd.Timestamp('2019-05-01'), pd.Timestamp('2018-08-02'), pd.Timestamp('2016-01-01'),pd.Timestamp('2015-03-01')]
df.ResponseDate = [pd.Timestamp('2019-09-19'), pd.Timestamp('2019-09-30'), pd.Timestamp('2018-01-01'),pd.Timestamp('2018-03-01')]


df['TotalDays']=(df.ResponseDate-df.RequestDate).dt.days+1  #This is done coz it
#  seems in sample data, that the day corresponding to **ResponseDate**
# has also been counted when it comes to number of days for each years 
year_min = df['RequestDate'].min().year
year_max = df['ResponseDate'].max().year
years = [i for i in range(year_min,year_max+1)]


for i in years:
    df[i]=0
df.columns=['RequestDate','ResponseDate', 'TotalDays', *years]
l=[]


for i in range(len(years)-1):
    z=[]
    for item, row in df.iterrows():
        row[years[i]] = (min(row['ResponseDate'], pd.Timestamp(f'{years[i]+1}-01-01'))-max(row['RequestDate'], pd.Timestamp(f'{years[i]-1}-12-31'))).days
        if cd.isleap(years[i])==True:
            if row[years[i]]<=0:
                row[years[i]]=0
            elif row[years[i]]>366:
                row[years[i]]=366
        else:
            if row[years[i]]<=0:
                row[years[i]]=0
            elif row[years[i]]>365:
                row[years[i]]=365

        z.append(row[years[i]])
    l.append(z)


for i in range(len(years)-1):
    df[years[i]]=l[i]
df[years[-1]]=df['TotalDays']-df.iloc[:, 3:-1].sum(axis=1)
df=df[['RequestDate','ResponseDate',*years,'TotalDays']]
df

there could be better answers, but I can't think of them.可能有更好的答案,但我想不出。 Does this work in your all cases?这对您的所有情况都有效吗?

Since I don't have enough reputation to comment here is an answer.由于我没有足够的声誉在这里发表评论,这是一个答案。

So my idea to make this frame would use DateTime and pandas.所以我制作这个框架的想法是使用 DateTime 和 pandas。 Supposing your data is on a csv file: 'yourfile.csv':假设您的数据在 csv 文件中:“yourfile.csv”:

import pandas as pd
from datetime import datetime
import time

your_data = pd.read_csv('yourfile.csv')

def take_columns(date):
    '''
    Transform the columns into datetime type
    '''
    date = datetime(*(time.strptime(date, '%Y-%m-%d')[:6]))
    return date

def count_year(start, end):
    ''' 
    Returns a dict, with the years as keys, and the 
    days of that year as value 
    '''
    yearsDict = {}
    delta = end-start
    while delta.days>0:
        if end.year > start.year:
            new_year = datetime(start.year+1,1,1,0,0)
            days_year = new_year - start
            yearsDict[start.year] = yearsDict.get(start.year, days_year.days)
            start = new_year
            delta = end - new_year
        elif end.year == start.year:
            new_year = datetime(start.year,1,1,0,0)
        if delta.days<365:
            yearsDict[new_year.year] = yearsDict.get(new_year.year, delta.days)
            break
    return yearsDict



your_data = your_data.set_index(['Department']) #set the index of the DataFrame
new_columns = set() #to add the new columns with the years

#here we transform the columns into datetime format
your_data['RequestDate'] = your_data['RequestDate'].apply(lambda x: take_columns(str(x)))
your_data['ResponseDate'] = your_data['ResponseDate'].apply(lambda x: take_columns(str(x)))

#now we're gonna read the RequestDate column to make a set with the years
#the set is to avoid repeat the years
your_data['RequestDate'].apply(lambda x: new_columns.add(x.year))

#and create the columns
for column_name in range(min(new_columns), max(new_columns)+1):
    your_data[column_name] = 0

your_data['TotalDays'] = your_data['ResponseDate'] - your_data['RequestDate'] #this is for the 'TotalDays' column

#and finally we add the values on the years
for row in your_data.index:
    years = count_year(your_data.loc[row]['RequestDate'],your_data.loc[row]['ResponseDate'])
    for year in years:
        your_data.at[row,year] = years[year]

Now you can export the result('your_data') to a file for example:现在您可以将结果('your_data')导出到文件中,例如:

your_data.to_csv('your_new_file.csv')

don't know if is the best way but it worked.不知道是否是最好的方法,但它有效。

Here's a generic function that can return the number of days in each year between two datetime.datetime objects.这是一个通用的 function 可以返回两个datetime.datetime对象之间每年的天数。

def days_per_year(dt1, dt2):
    ''' Return a list of years and number of days in that year
        occurring in the range between dt1 and dt2.
    '''
    # remove hours,minutes,seconds to turn these into pure dates
    dt1 = dt1.replace(hour=0, minute=0, second=0)
    dt2 = dt2.replace(hour=0, minute=0, second=0)
    if dt1 > dt2:
        dt1, dt2 = dt2, dt1 # swap if out of order
    ret = []
    for y in range(dt1.year, dt2.year + 1):
        year_end = min(dt2, datetime.datetime(y + 1, 1, 1))
        year_start = max(dt1, datetime.datetime(y, 1, 1))
        ret.append((y, (year_end - year_start).days))
    return ret

>>> for RequestDate, ResponseDate in (('2019-05-01','2019-09-19'),('2018-08-02','2019-09-30'),('2016-01-01','2018-01-01'),('2015-03-01','2018-03-01')):
    RequestDate = datetime.datetime.strptime(RequestDate, '%Y-%m-%d')
    ResponseDate = datetime.datetime.strptime(ResponseDate, '%Y-%m-%d')
    print(RequestDate, ResponseDate, days_per_year(RequestDate, ResponseDate))

2019-05-01 00:00:00 2019-09-19 00:00:00 [(2019, 141)]
2018-08-02 00:00:00 2019-09-30 00:00:00 [(2018, 152), (2019, 272)]
2016-01-01 00:00:00 2018-01-01 00:00:00 [(2016, 366), (2017, 365), (2018, 0)]
2015-03-01 00:00:00 2018-03-01 00:00:00 [(2015, 306), (2016, 366), (2017, 365), (2018, 59)]

It's unclear if you want the last day to count or not, half your examples do but half don't.目前尚不清楚您是否要计算最后一天,您的示例中有一半可以,但有一半没有。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM