I have a pandas Dataframe that looks like this:
year month name value1 value2
0 2021 7 cars 5000 4000
1 2021 7 boats 2000 250
2 2021 9 cars 3000 7000
And I want it to look like this:
year month day name value1 value2
0 2021 7 1 cars 161.29 129.03
1 2021 7 2 cars 161.29 129.03
2 2021 7 3 cars 161.29 129.03
3 2021 7 4 cars 161.29 129.03
...
31 2021 7 1 boats 64.51 8.064
32 2021 7 2 boats 64.51 8.064
33 2021 7 3 boats 64.51 8.064
...
62 2021 9 1 cars 100 233.33
63 2021 9 1 cars 100 233.33
64 2021 9 1 cars 100 233.33
The idea is that i want to divide the value columns by the number of days in the month, and create a day column so that in the end i can achieve a date column concatenating year, month and day.
Can anyone help me?
One option would be to use monthrange
from calendar
to get the number of days in a given month, divide the value by days in the month, then use Index.repeat
to scale up the DataFrame and groupby cumcount
to add in the Days:
from calendar import monthrange
import pandas as pd
df = pd.DataFrame(
{'year': {0: 2021, 1: 2021, 2: 2021}, 'month': {0: 7, 1: 7, 2: 9},
'name': {0: 'cars', 1: 'boats', 2: 'cars'},
'value1': {0: 5000, 1: 2000, 2: 3000},
'value2': {0: 4000, 1: 250, 2: 7000}})
days_in_month = (
df[['year', 'month']].apply(lambda x: monthrange(*x)[1], axis=1)
)
# Calculate new values
df.loc[:, 'value1':] = df.loc[:, 'value1':].div(days_in_month, axis=0)
df = df.loc[df.index.repeat(days_in_month)] # Scale Up DataFrame
df.insert(2, 'day', df.groupby(level=0).cumcount() + 1) # Add Days Column
df = df.reset_index(drop=True) # Clean up Index
df
:
year month day name value1 value2
0 2021 7 1 cars 161.290323 129.032258
1 2021 7 2 cars 161.290323 129.032258
2 2021 7 3 cars 161.290323 129.032258
3 2021 7 4 cars 161.290323 129.032258
4 2021 7 5 cars 161.290323 129.032258
.. ... ... ... ... ... ...
87 2021 9 26 cars 100.000000 233.333333
88 2021 9 27 cars 100.000000 233.333333
89 2021 9 28 cars 100.000000 233.333333
90 2021 9 29 cars 100.000000 233.333333
91 2021 9 30 cars 100.000000 233.333333
for that you need to create a temp dataframe that will include the days in each month, then merge it, then divide the values
let's assume that you have data single year, so we can create the date range from it straight away, and create the temp dataframe:
dt_range = pd.DatFrame(pd.date_range(df.loc[0,'year'] + '-01-01', periods=365))
dt_range.columns = ['dte']
dt_range['year'] = dt_range['dte'].dt.year
dt_range['month'] = dt_range['dte'].dt.month
dt_range['day'] = dt_range['dte'].dt.day
now we can create the new dataframe:
new_df = pd.merge(df, dt_range,how='left',on=['year','month'])
now all we have to do is group by and merge, and we have what you needed
new_df = new_df.groupby(['year','month','day']).agg({'value':'mean'})
You can use resample
to upsample months into days:
import pandas as pd
df = pd.DataFrame([[2021,7,5000]], columns=['year', 'month', 'value'])
# create datetime column as period
df['datetime'] = pd.to_datetime(df['month'].astype(str) + '/' + df['year'].astype(str)).dt.to_period("M")
# calculate values per day by dividing the value by number of days per month
df['ndays'] = df['datetime'].apply(lambda x: x.days_in_month)
df['value'] = df['value'] / df['ndays']
# set datetime as index and resample:
df = df[['value', 'datetime']].set_index('datetime')
df = df.resample('d').ffill().reset_index()
#split datetime to separate columns
df['day'] = df['datetime'].dt.day
df['month'] = df['datetime'].dt.month
df['year'] = df['datetime'].dt.year
df.drop(columns=['datetime'], inplace=True)
value | day | month | year | |
---|---|---|---|---|
0 | 161.29 | 1 | 7 | 2021 |
1 | 161.29 | 2 | 7 | 2021 |
2 | 161.29 | 3 | 7 | 2021 |
3 | 161.29 | 4 | 7 | 2021 |
4 | 161.29 | 5 | 7 | 2021 |
I assume dataframe can have more months, for example extending a little Your initial dataframe:
df = pd.read_csv(StringIO("""
year month value
2021 7 5000
2021 8 5000
2021 9 5000
"""), sep = "\t")
Which gives dataframe df
:
year month value
0 2021 7 5000
1 2021 8 5000
2 2021 9 5000
Solution is simple one-liner: first datetime index is created from raw year
and month
, then resample method is used to convert months to days, finally value
is overwritten by calculating average per day in every month:
df_out = (
df.set_index(pd.DatetimeIndex(pd.to_datetime(dict(year=df.year, month=df.month, day=1)), freq="MS"))
.resample('D')
.ffill()
.assign(value = lambda df: df.value/df.index.days_in_month)
)
Resulting dataframe:
year month value
2021-07-01 2021 7 161.290323
2021-07-02 2021 7 161.290323
2021-07-03 2021 7 161.290323
2021-07-04 2021 7 161.290323
2021-07-05 2021 7 161.290323
... ... ...
2021-08-28 2021 8 161.290323
2021-08-29 2021 8 161.290323
2021-08-30 2021 8 161.290323
2021-08-31 2021 8 161.290323
2021-09-01 2021 9 166.666667
Please note September has only 30 days, so value
is different than in previous months.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.