[英]Split Business days in respective month
df1
no From To check
1 27-Jan-20 28-Mar-20 a
2 28-Mar-20 12-Apr-20 a
3 29-May-20 29-May-20 b
4 5-Apr-20 12-Apr-20 b
df2
col1 col2
a 9-Apr-20
b 30-Mar-20
df
no From To check total Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1 27-Jan-20 28-Mar-20 a 45 5 20 20
2 28-Mar-20 12-Apr-20 a 9 2 7
3 29-May-20 29-May-20 b 1 1
4 5-Apr-20 12-Apr-20 b 5 5
我需要计算两件事
对于第 1 部分:df1 中的“总计”列是使用计算的
np.busday_count('2020-01-27','2020-03-28')
但这并不准确,并且无法在其中包含假期(df2),我尝试使用直接创建 dataframe
df['total']=np.busday_count(df1['From'].astype('datetime64[D]')
,df1['To'].astype('datetime64[D]'))
但它给出了错误。
您可以在自定义 function 中使用bdate_range
# dict of num to month mapping
months = pd.tseries.frequencies.MONTH_ALIASES
df2['col2'] = pd.to_datetime(df2['col2'], dayfirst=True)
# holiday month
df['holiday'] = df['check'].map(df2.set_index(['col1'])['col2']).dt.month
def count_by_month(s):
start, end, holiday = s['From'], s['To'], s['holiday']
valid_dates = pd.bdate_range(start=start, end=end).month
count = dict(pd.Series(valid_dates).value_counts())
# subtract holidays
if holiday in count:
count[holiday] -= 1
return pd.concat([s, pd.Series({v: count.get(k, 0) for k, v in months.items()})], axis=0)
print(df)
no From To check total holiday JAN FEB MAR APR \
0 1 27-Jan-20 28-Mar-20 a 45 4 5 20 20 0
1 2 28-Mar-20 12-Apr-20 a 9 4 0 0 2 7
2 3 29-May-20 29-May-20 b 1 3 0 0 0 0
3 4 5-Apr-20 12-Apr-20 b 5 3 0 0 0 5
MAY JUN JUL AUG SEP OCT NOV DEC
0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
2 1 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.