[英]How to divide total days output by individual year in python so that total days doesn't effect one particular year
我目前正在分析对每个部门的请求的响应延迟。 数据格式如下:
Department RequestDate ResponseDate
Electronics 2019-05-01 2019-09-19
Babyshop 2018-08-02 2019-09-30
Grocery 2016-01-01 2018-01-01
Pharmacy 2015-03-01 2018-03-01
我试图完成的是将总天数划分为相应的年份。 预期的 output 如下:
Department RequestDate ResponseDate 2015 2016 2017 2018 2019 TotalDays
Electronics 2019-05-01 2019-09-19 0 0 0 0 149 149
Babyshop 2018-08-02 2019-09-30 0 0 0 152 272 424
Grocery 2016-01-01 2018-01-01 0 365 365 1 0 731
Pharmacy 2015-03-01 2018-03-01 306 365 365 60 0 1096
目前我的工作流程在 excel 中,而且很整洁。 有没有办法利用 python 功能。
我已尽力在解决方案中包含每个边界条件。 就索引而言,我认为您可以解决这个问题。
import calendar as cd
df = pd.DataFrame(columns=['RequestDate','ResponseDate'])
df.RequestDate = [pd.Timestamp('2019-05-01'), pd.Timestamp('2018-08-02'), pd.Timestamp('2016-01-01'),pd.Timestamp('2015-03-01')]
df.ResponseDate = [pd.Timestamp('2019-09-19'), pd.Timestamp('2019-09-30'), pd.Timestamp('2018-01-01'),pd.Timestamp('2018-03-01')]
df['TotalDays']=(df.ResponseDate-df.RequestDate).dt.days+1 #This is done coz it
# seems in sample data, that the day corresponding to **ResponseDate**
# has also been counted when it comes to number of days for each years
year_min = df['RequestDate'].min().year
year_max = df['ResponseDate'].max().year
years = [i for i in range(year_min,year_max+1)]
for i in years:
df[i]=0
df.columns=['RequestDate','ResponseDate', 'TotalDays', *years]
l=[]
for i in range(len(years)-1):
z=[]
for item, row in df.iterrows():
row[years[i]] = (min(row['ResponseDate'], pd.Timestamp(f'{years[i]+1}-01-01'))-max(row['RequestDate'], pd.Timestamp(f'{years[i]-1}-12-31'))).days
if cd.isleap(years[i])==True:
if row[years[i]]<=0:
row[years[i]]=0
elif row[years[i]]>366:
row[years[i]]=366
else:
if row[years[i]]<=0:
row[years[i]]=0
elif row[years[i]]>365:
row[years[i]]=365
z.append(row[years[i]])
l.append(z)
for i in range(len(years)-1):
df[years[i]]=l[i]
df[years[-1]]=df['TotalDays']-df.iloc[:, 3:-1].sum(axis=1)
df=df[['RequestDate','ResponseDate',*years,'TotalDays']]
df
可能有更好的答案,但我想不出。 这对您的所有情况都有效吗?
由于我没有足够的声誉在这里发表评论,这是一个答案。
所以我制作这个框架的想法是使用 DateTime 和 pandas。 假设您的数据在 csv 文件中:“yourfile.csv”:
import pandas as pd
from datetime import datetime
import time
your_data = pd.read_csv('yourfile.csv')
def take_columns(date):
'''
Transform the columns into datetime type
'''
date = datetime(*(time.strptime(date, '%Y-%m-%d')[:6]))
return date
def count_year(start, end):
'''
Returns a dict, with the years as keys, and the
days of that year as value
'''
yearsDict = {}
delta = end-start
while delta.days>0:
if end.year > start.year:
new_year = datetime(start.year+1,1,1,0,0)
days_year = new_year - start
yearsDict[start.year] = yearsDict.get(start.year, days_year.days)
start = new_year
delta = end - new_year
elif end.year == start.year:
new_year = datetime(start.year,1,1,0,0)
if delta.days<365:
yearsDict[new_year.year] = yearsDict.get(new_year.year, delta.days)
break
return yearsDict
your_data = your_data.set_index(['Department']) #set the index of the DataFrame
new_columns = set() #to add the new columns with the years
#here we transform the columns into datetime format
your_data['RequestDate'] = your_data['RequestDate'].apply(lambda x: take_columns(str(x)))
your_data['ResponseDate'] = your_data['ResponseDate'].apply(lambda x: take_columns(str(x)))
#now we're gonna read the RequestDate column to make a set with the years
#the set is to avoid repeat the years
your_data['RequestDate'].apply(lambda x: new_columns.add(x.year))
#and create the columns
for column_name in range(min(new_columns), max(new_columns)+1):
your_data[column_name] = 0
your_data['TotalDays'] = your_data['ResponseDate'] - your_data['RequestDate'] #this is for the 'TotalDays' column
#and finally we add the values on the years
for row in your_data.index:
years = count_year(your_data.loc[row]['RequestDate'],your_data.loc[row]['ResponseDate'])
for year in years:
your_data.at[row,year] = years[year]
现在您可以将结果('your_data')导出到文件中,例如:
your_data.to_csv('your_new_file.csv')
不知道是否是最好的方法,但它有效。
这是一个通用的 function 可以返回两个datetime.datetime
对象之间每年的天数。
def days_per_year(dt1, dt2):
''' Return a list of years and number of days in that year
occurring in the range between dt1 and dt2.
'''
# remove hours,minutes,seconds to turn these into pure dates
dt1 = dt1.replace(hour=0, minute=0, second=0)
dt2 = dt2.replace(hour=0, minute=0, second=0)
if dt1 > dt2:
dt1, dt2 = dt2, dt1 # swap if out of order
ret = []
for y in range(dt1.year, dt2.year + 1):
year_end = min(dt2, datetime.datetime(y + 1, 1, 1))
year_start = max(dt1, datetime.datetime(y, 1, 1))
ret.append((y, (year_end - year_start).days))
return ret
>>> for RequestDate, ResponseDate in (('2019-05-01','2019-09-19'),('2018-08-02','2019-09-30'),('2016-01-01','2018-01-01'),('2015-03-01','2018-03-01')):
RequestDate = datetime.datetime.strptime(RequestDate, '%Y-%m-%d')
ResponseDate = datetime.datetime.strptime(ResponseDate, '%Y-%m-%d')
print(RequestDate, ResponseDate, days_per_year(RequestDate, ResponseDate))
2019-05-01 00:00:00 2019-09-19 00:00:00 [(2019, 141)]
2018-08-02 00:00:00 2019-09-30 00:00:00 [(2018, 152), (2019, 272)]
2016-01-01 00:00:00 2018-01-01 00:00:00 [(2016, 366), (2017, 365), (2018, 0)]
2015-03-01 00:00:00 2018-03-01 00:00:00 [(2015, 306), (2016, 366), (2017, 365), (2018, 59)]
目前尚不清楚您是否要计算最后一天,您的示例中有一半可以,但有一半没有。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.