简体   繁体   English

Pandas Groupby的两列日期之间的工作日

[英]Business days between two columns of dates with Pandas Groupby

I have a Dataframe in Pandas with a letter and two dates as columns. 我在Pandas有一个数据Dataframe ,其中有一个字母和两个日期作为列。 I would like to calculate the difference between the two date columns for the previous row using shift(1) provided that the Letter value is the same (using a groupby ). 我想使用shift(1)计算上一行的两个日期列之间的差异,前提是Letter值相同(使用groupby )。 The complex part is I would like to calculate business days, not just elapsed days. 复杂的部分是我想计算工作日,而不仅仅是经过的天数。 The best way I have found to do that is using a numpy.busday_count , which takes two lists as an argument. 我发现做到这一点的最佳方法是使用numpy.busday_count ,它以两个列表作为参数。 I am essentially trying to use .apply to make each row it's own list. 我本质上是试图使用.apply使每一行成为自己的列表。 Not sure if this is the best way to do it, but running into some problems, which are ambiguous. 不确定这是否是最好的方法,但是会遇到一些模棱两可的问题。

import pandas as pd
from datetime import datetime
import numpy as np

# create dataframe
df = pd.DataFrame(data=[['A', datetime(2016,01,07), datetime(2016,01,09)],
                        ['A', datetime(2016,03,01), datetime(2016,03,8)],
                        ['B', datetime(2016,05,01), datetime(2016,05,10)],
                        ['B', datetime(2016,06,05), datetime(2016,06,07)]],
                   columns=['Letter', 'First Day', 'Last Day'])

# convert to dates since pandas reads them in as time series
df['First Day'] = df['First Day'].apply(lambda x: x.to_datetime().date())
df['Last Day'] = df['Last Day'].apply(lambda x: x.to_datetime().date())

df['Gap'] = (df.groupby('Letter')
                         .apply(
                                lambda x: (
                                            np.busday_count(x['First Day'].shift(1).tolist(),
                                                            x['Last Day'].shift(1).tolist())))
                         .reset_index(drop=True))
print df

I get the following error on the lambda function. 我在lambda函数上收到以下错误。 I'm not sure what object it's having problems with as the two passed arguments should be dates: 我不确定哪个对象有问题,因为两个传递的参数应该是日期:

ValueError: Could not convert object to NumPy datetime

Desired Output: 所需输出:

  Letter   First Day    Last Day   Gap
0      A  2016-01-07  2016-01-09  NAN
1      A  2016-03-01  2016-03-08  1
2      B  2016-05-01  2016-05-10  NAN
3      B  2016-06-05  2016-06-07  7

The following should work - first removing the leading zeros from the date digits): 以下应该起作用-首先从日期数字中删除前导零):

df = pd.DataFrame(data=[['A', datetime(2016, 1, 7), datetime(2016, 1, 9)],
                        ['A', datetime(2016, 3, 1), datetime(2016, 3, 8)],
                        ['B', datetime(2016, 5, 1), datetime(2016, 5, 10)],
                        ['B', datetime(2016, 6, 5), datetime(2016, 6, 7)]],
                  columns=['Letter', 'First Day', 'Last Day'])

df['Gap'] = df.groupby('Letter')
              .apply(
               lambda x: 
                   pd.DataFrame(
                       np.busday_count(x['First Day'].tolist(), x['Last Day'].tolist())).shift())
                  .reset_index(drop=True)

  Letter  First Day   Last Day  Gap
0      A 2016-01-07 2016-01-09  NaN
1      A 2016-03-01 2016-03-08  2.0
2      B 2016-05-01 2016-05-10  NaN
3      B 2016-06-05 2016-06-07  6.0

I don't think you need the .date() conversion. 我认为您不需要.date()转换。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM