简体   繁体   中英

How do I offset Pandas dayofyear so start date is 1st October not 1st January?

I have the following dataframe:

How do I offset Pandas dayofyear so start date is 1st October not 1st January

在此处输入图片说明

In this case, I want the year to be from 1st October to 30th Sep, and it would need to take into account leap years.

Below is an example of how I want the output, with the day of the year column the only variable.

在此处输入图片说明

here is the dataframe in table form:

           Day  stock  dayofyear  weekday  month  year  leapyear
0   24/09/2019     10        267        1      9  2019     False
1   25/09/2019     10        268        2      9  2019     False
2   26/09/2019     11        269        3      9  2019     False
3   27/09/2019     12        270        4      9  2019     False
4   28/09/2019     14        271        5      9  2019     False
5   29/09/2019     14        272        6      9  2019     False
6   30/09/2019     15        273        0      9  2019     False
7   01/10/2019     16        274        1     10  2019     False
8   02/10/2019     17        275        2     10  2019     False
9   03/10/2019     18        276        3     10  2019     False
10  04/10/2019     19        277        4     10  2019     False

Use:

df['Day'] = pd.to_datetime(df['Day'], dayfirst=True)

base_year = np.where(df['month'].ge(10), df['year'], df['year'].sub(1))
base_date = pd.to_datetime(base_year, format='%Y') + pd.DateOffset(months=9)
df['dayofyear'] = (df['Day'] - base_date).dt.days.add(1)

Details:

Using pd.to_datetime convert the Day column to pandas datetime series, then use np.where along with Series.gt and Series.sub to compute the base_year for each date in Day column.

print(base_year)
array([2018, 2018, 2018, 2018, 2018, 2018, 2018, 2019, 2019, 2019, 2019])

Use pd.to_datetime to convert the base_year to pandas datetime series and add a offset of 9 months so that the base_date starts from 1 October .

print(base_date)
DatetimeIndex(['2018-10-01', '2018-10-01', '2018-10-01', '2018-10-01',
               '2018-10-01', '2018-10-01', '2018-10-01', '2019-10-01',
               '2019-10-01', '2019-10-01', '2019-10-01'],
              dtype='datetime64[ns]', freq=None)

Subtract the Day column from this base_date and use Series.dt.days to compute the dayofyear :

print(df)
          Day  stock  dayofyear  weekday  month  year  leapyear
0  2019-09-24     10        359        1      9  2019     False
1  2019-09-25     10        360        2      9  2019     False
2  2019-09-26     11        361        3      9  2019     False
3  2019-09-27     12        362        4      9  2019     False
4  2019-09-28     14        363        5      9  2019     False
5  2019-09-29     14        364        6      9  2019     False
6  2019-09-30     15        365        0      9  2019     False
7  2019-10-01     16          1        1     10  2019     False
8  2019-10-02     17          2        2     10  2019     False
9  2019-10-03     18          3        3     10  2019     False
10 2019-10-04     19          4        4     10  2019     False

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM