简体   繁体   中英

Add a new column to pandas dataframe with increment dates within groups

I have a pandas dataframe (df) which is grouped by an index value. with repeated rows.

I would like to add a new column with a dates for each row, representing each day within the df.start_date and df.end_date for each group:

From this:

index    start_date   end_date    number_of_days  value
0        8/12/2019    10/12/2019  3               2091
0        8/12/2019    10/12/2019  3               2091
0        8/12/2019    10/12/2019  3               2091
1        11/12/2019   14/12/2019  4               2214
1        11/12/2019   14/12/2019  4               2214
1        11/12/2019   14/12/2019  4               2214
1        11/12/2019   14/12/2019  4               2214
2        15/12/2019   16/12/2019  2               8430
2        15/12/2019   16/12/2019  2               8430

To this:

index    start_date   end_date    number_of_days  value  NEW_COLUMN
0        8/12/2019    10/12/2019  3               2091   8/12/2019
0        8/12/2019    10/12/2019  3               2091   9/12/2019
0        8/12/2019    10/12/2019  3               2091   10/12/2019
1        11/12/2019   14/12/2019  4               2214   11/12/2019
1        11/12/2019   14/12/2019  4               2214   12/12/2019 
1        11/12/2019   14/12/2019  4               2214   13/12/2019
1        11/12/2019   14/12/2019  4               2214   14/12/2019
2        15/12/2019   16/12/2019  2               8430   15/12/2019
2        15/12/2019   16/12/2019  2               8430   16/12/2019

So far I been able to create a date range based on pd.date_range and then try to add that range as a new value to the dataframe..

#generate a daily date range based on min and max values
date_range_to_add = pd.date_range(start=min(df['start_date'])
                                  ,end=max(df['end_date']),freq='D')

repeated_dw_data['dates']= date_range_to_add

However, the error I get is that the length of values does not match length of index. Obviously as the date range generated is shorter than the full list of rows.

I think I am going to have to create a function to generate a date range for each group, add that and then append in group into a new dataframe. But before I go down this road, is there anything that I'm missing as a simpler route? What is the best way to interate over each group and applying this logic?

If length of each group per index values is same like number_of_days column solution should be create timedeltas of days by counter by GroupBy.cumcount and to_timedelta and add to start_date column:

repeated_dw_data['start_date']=pd.to_datetime(repeated_dw_data['start_date'], dayfirst=True)

td = pd.to_timedelta(repeated_dw_data.groupby(level=0).cumcount(), unit='d')
repeated_dw_data['new'] = repeated_dw_data['start_date'] + td
print (repeated_dw_data)
      start_date    end_date  number_of_days  value        new
index                                                         
0     2019-12-08  10/12/2019               3   2091 2019-12-08
0     2019-12-08  10/12/2019               3   2091 2019-12-09
0     2019-12-08  10/12/2019               3   2091 2019-12-10
1     2019-12-11  14/12/2019               4   2214 2019-12-11
1     2019-12-11  14/12/2019               4   2214 2019-12-12
1     2019-12-11  14/12/2019               4   2214 2019-12-13
1     2019-12-11  14/12/2019               4   2214 2019-12-14
2     2019-12-15  16/12/2019               2   8430 2019-12-15
2     2019-12-15  16/12/2019               2   8430 2019-12-16

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM