繁体   English   中英

如何在 python 中将总天数 output 除以个别年份,这样总天数不会影响某一特定年份

[英]How to divide total days output by individual year in python so that total days doesn't effect one particular year

我目前正在分析对每个部门的请求的响应延迟。 数据格式如下:

Department     RequestDate     ResponseDate 
Electronics    2019-05-01      2019-09-19
Babyshop       2018-08-02      2019-09-30
Grocery        2016-01-01      2018-01-01
Pharmacy       2015-03-01      2018-03-01

我试图完成的是将总天数划分为相应的年份。 预期的 output 如下:

Department     RequestDate     ResponseDate   2015  2016  2017  2018  2019    TotalDays
Electronics    2019-05-01      2019-09-19      0      0    0     0     149     149
Babyshop       2018-08-02      2019-09-30      0      0    0     152   272     424
Grocery        2016-01-01      2018-01-01      0      365  365   1     0       731
Pharmacy       2015-03-01      2018-03-01      306    365  365   60    0       1096

目前我的工作流程在 excel 中,而且很整洁。 有没有办法利用 python 功能。

我已尽力在解决方案中包含每个边界条件。 就索引而言,我认为您可以解决这个问题。

import calendar as cd
df = pd.DataFrame(columns=['RequestDate','ResponseDate'])
df.RequestDate = [pd.Timestamp('2019-05-01'), pd.Timestamp('2018-08-02'), pd.Timestamp('2016-01-01'),pd.Timestamp('2015-03-01')]
df.ResponseDate = [pd.Timestamp('2019-09-19'), pd.Timestamp('2019-09-30'), pd.Timestamp('2018-01-01'),pd.Timestamp('2018-03-01')]


df['TotalDays']=(df.ResponseDate-df.RequestDate).dt.days+1  #This is done coz it
#  seems in sample data, that the day corresponding to **ResponseDate**
# has also been counted when it comes to number of days for each years 
year_min = df['RequestDate'].min().year
year_max = df['ResponseDate'].max().year
years = [i for i in range(year_min,year_max+1)]


for i in years:
    df[i]=0
df.columns=['RequestDate','ResponseDate', 'TotalDays', *years]
l=[]


for i in range(len(years)-1):
    z=[]
    for item, row in df.iterrows():
        row[years[i]] = (min(row['ResponseDate'], pd.Timestamp(f'{years[i]+1}-01-01'))-max(row['RequestDate'], pd.Timestamp(f'{years[i]-1}-12-31'))).days
        if cd.isleap(years[i])==True:
            if row[years[i]]<=0:
                row[years[i]]=0
            elif row[years[i]]>366:
                row[years[i]]=366
        else:
            if row[years[i]]<=0:
                row[years[i]]=0
            elif row[years[i]]>365:
                row[years[i]]=365

        z.append(row[years[i]])
    l.append(z)


for i in range(len(years)-1):
    df[years[i]]=l[i]
df[years[-1]]=df['TotalDays']-df.iloc[:, 3:-1].sum(axis=1)
df=df[['RequestDate','ResponseDate',*years,'TotalDays']]
df

可能有更好的答案,但我想不出。 这对您的所有情况都有效吗?

由于我没有足够的声誉在这里发表评论,这是一个答案。

所以我制作这个框架的想法是使用 DateTime 和 pandas。 假设您的数据在 csv 文件中:“yourfile.csv”:

import pandas as pd
from datetime import datetime
import time

your_data = pd.read_csv('yourfile.csv')

def take_columns(date):
    '''
    Transform the columns into datetime type
    '''
    date = datetime(*(time.strptime(date, '%Y-%m-%d')[:6]))
    return date

def count_year(start, end):
    ''' 
    Returns a dict, with the years as keys, and the 
    days of that year as value 
    '''
    yearsDict = {}
    delta = end-start
    while delta.days>0:
        if end.year > start.year:
            new_year = datetime(start.year+1,1,1,0,0)
            days_year = new_year - start
            yearsDict[start.year] = yearsDict.get(start.year, days_year.days)
            start = new_year
            delta = end - new_year
        elif end.year == start.year:
            new_year = datetime(start.year,1,1,0,0)
        if delta.days<365:
            yearsDict[new_year.year] = yearsDict.get(new_year.year, delta.days)
            break
    return yearsDict



your_data = your_data.set_index(['Department']) #set the index of the DataFrame
new_columns = set() #to add the new columns with the years

#here we transform the columns into datetime format
your_data['RequestDate'] = your_data['RequestDate'].apply(lambda x: take_columns(str(x)))
your_data['ResponseDate'] = your_data['ResponseDate'].apply(lambda x: take_columns(str(x)))

#now we're gonna read the RequestDate column to make a set with the years
#the set is to avoid repeat the years
your_data['RequestDate'].apply(lambda x: new_columns.add(x.year))

#and create the columns
for column_name in range(min(new_columns), max(new_columns)+1):
    your_data[column_name] = 0

your_data['TotalDays'] = your_data['ResponseDate'] - your_data['RequestDate'] #this is for the 'TotalDays' column

#and finally we add the values on the years
for row in your_data.index:
    years = count_year(your_data.loc[row]['RequestDate'],your_data.loc[row]['ResponseDate'])
    for year in years:
        your_data.at[row,year] = years[year]

现在您可以将结果('your_data')导出到文件中,例如:

your_data.to_csv('your_new_file.csv')

不知道是否是最好的方法,但它有效。

这是一个通用的 function 可以返回两个datetime.datetime对象之间每年的天数。

def days_per_year(dt1, dt2):
    ''' Return a list of years and number of days in that year
        occurring in the range between dt1 and dt2.
    '''
    # remove hours,minutes,seconds to turn these into pure dates
    dt1 = dt1.replace(hour=0, minute=0, second=0)
    dt2 = dt2.replace(hour=0, minute=0, second=0)
    if dt1 > dt2:
        dt1, dt2 = dt2, dt1 # swap if out of order
    ret = []
    for y in range(dt1.year, dt2.year + 1):
        year_end = min(dt2, datetime.datetime(y + 1, 1, 1))
        year_start = max(dt1, datetime.datetime(y, 1, 1))
        ret.append((y, (year_end - year_start).days))
    return ret

>>> for RequestDate, ResponseDate in (('2019-05-01','2019-09-19'),('2018-08-02','2019-09-30'),('2016-01-01','2018-01-01'),('2015-03-01','2018-03-01')):
    RequestDate = datetime.datetime.strptime(RequestDate, '%Y-%m-%d')
    ResponseDate = datetime.datetime.strptime(ResponseDate, '%Y-%m-%d')
    print(RequestDate, ResponseDate, days_per_year(RequestDate, ResponseDate))

2019-05-01 00:00:00 2019-09-19 00:00:00 [(2019, 141)]
2018-08-02 00:00:00 2019-09-30 00:00:00 [(2018, 152), (2019, 272)]
2016-01-01 00:00:00 2018-01-01 00:00:00 [(2016, 366), (2017, 365), (2018, 0)]
2015-03-01 00:00:00 2018-03-01 00:00:00 [(2015, 306), (2016, 366), (2017, 365), (2018, 59)]

目前尚不清楚您是否要计算最后一天,您的示例中有一半可以,但有一半没有。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM