[英]How to get week start dates and week number of each week in a year considering start day of the week is Monday in python?
[英]How to calculate total occupancy days for each day of year, given a dataframe of start and end dates?
我有一个 csv 文件,因此列表或数据框包含访问露营地的开始和结束日期。
start_date end_date
0 2016-01-21 2016-01-24
1 2016-01-28 2016-01-29
2 2016-02-02 2016-02-10
3 2016-02-08 2016-02-12
...
我想计算一个数据框,其中包含该时间段内每一天的一行,一列计算累计访问者,一列表示当天的访问者人数和访问者天数的累计总和。
我目前有一些 hacky 代码将访问者数据读入一个普通的 python 列表visitor_array
,并为期间/年中的每个日期创建另一个列表year_array
。 然后, year_array
每个日期进行循环,并在year_array
上进行内部循环,并将visitor_array
的当前元素与year_array
的新访客数和常驻访客数附加在一起。
temp_day = datetime.date(2016,1,1)
year_array = [[temp_day + datetime.timedelta(days=d)] for d in range(365)]
for day in year_array:
new_visitors = 0
occupancy = 0
for visitor in visitor_array:
if visitor[0] = day:
new_visitors +=1
if (visitor[0] <= day[0]) and (day[0] <= visitor[1]):
occupancy +=1
day = day.append(new_visitors)
day = day.append(occupancy)
然后我将year_array
转换为year_array
数据框,创建一些 cumsum 列并忙于绘图等
是否有更优雅的 pythonic/pandasic 方式在 Pandas 中完成这一切?
考虑df
具有开始/结束值的数据帧和d
最终数据帧,我会做这样的事情:
代码:
import numpy as np
import pandas as pd
import datetime
# ---- Create df sample
df = pd.DataFrame([['21/01/2016','24/01/2016'],
['28/01/2016','29/01/2016'],
['02/02/2016','10/02/2016'],
['08/02/2016','12/02/2016']], columns=['start','end'] )
df['start'] = pd.to_datetime(df['start'])
df['end'] = pd.to_datetime(df['end'])
# ---- Create day index
temp_day = datetime.date(2016,1,1)
index = [(temp_day + datetime.timedelta(days=d)) for d in range(365)]
# ---- Create empty result df
# initialize df, set days as datetime in index
d = pd.DataFrame(np.zeros((365,3)),
index=pd.to_datetime(index),
columns=['new_visitor','occupancy','occupied_day'])
# ---- Iterate over df to fill d (final df)
for i, row in df.iterrows():
# Add 1 if first day for new visitor
d.loc[row.start,'new_visitor'] += 1
# 1 if some visitor in df.start, df.end
d.loc[row.start:row.end,'occupied_day'] = 1
# Add 1 for visitor occupancy these days
d.loc[row.start:row.end,'occupancy'] += 1
#cumulated days = some of occupied days
d['cumul_days'] = d.occupied_day.cumsum()
#cumulated visitors = some of occupancy
d['cumul_visitors'] = d.occupancy.cumsum()
结果输出print(d.loc['2016-01-21':'2016-01-29'])
一些摘录:
index new_visitor occupancy occupied_day cumul_days cumul_visitors
2016-01-21 1.0 1.0 1.0 1.0 1.0
2016-01-22 0.0 1.0 0.0 1.0 2.0
2016-01-23 0.0 1.0 0.0 1.0 3.0
2016-01-24 0.0 1.0 0.0 1.0 4.0
2016-01-25 0.0 0.0 0.0 1.0 4.0
2016-01-26 0.0 0.0 0.0 1.0 4.0
2016-01-27 0.0 0.0 0.0 1.0 4.0
2016-01-28 1.0 1.0 1.0 2.0 5.0
2016-01-29 0.0 1.0 0.0 2.0 6.0
愿此代码有帮助!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.