简体   繁体   中英

how to divide a value by the day in the month and create a day column in Pandas?

I have a pandas Dataframe that looks like this:

   year  month  name  value1  value2
0  2021    7    cars   5000    4000 
1  2021    7   boats   2000     250
2  2021    9    cars   3000    7000

And I want it to look like this:

    year  month day  name  value1  value2
0   2021    7    1   cars  161.29  129.03
1   2021    7    2   cars  161.29  129.03
2   2021    7    3   cars  161.29  129.03
3   2021    7    4   cars  161.29  129.03
              ...
31  2021    7    1   boats  64.51   8.064
32  2021    7    2   boats  64.51   8.064
33  2021    7    3   boats  64.51   8.064
              ...
62  2021    9    1    cars   100    233.33
63  2021    9    1    cars   100    233.33
64  2021    9    1    cars   100    233.33

The idea is that i want to divide the value columns by the number of days in the month, and create a day column so that in the end i can achieve a date column concatenating year, month and day.

Can anyone help me?

One option would be to use monthrange from calendar to get the number of days in a given month, divide the value by days in the month, then use Index.repeat to scale up the DataFrame and groupby cumcount to add in the Days:

from calendar import monthrange

import pandas as pd

df = pd.DataFrame(
    {'year': {0: 2021, 1: 2021, 2: 2021}, 'month': {0: 7, 1: 7, 2: 9},
     'name': {0: 'cars', 1: 'boats', 2: 'cars'},
     'value1': {0: 5000, 1: 2000, 2: 3000},
     'value2': {0: 4000, 1: 250, 2: 7000}})
days_in_month = (
    df[['year', 'month']].apply(lambda x: monthrange(*x)[1], axis=1)
)

# Calculate new values
df.loc[:, 'value1':] = df.loc[:, 'value1':].div(days_in_month, axis=0)
df = df.loc[df.index.repeat(days_in_month)]  # Scale Up DataFrame
df.insert(2, 'day', df.groupby(level=0).cumcount() + 1)  # Add Days Column
df = df.reset_index(drop=True)  # Clean up Index

df :

    year  month  day  name      value1      value2
0   2021      7    1  cars  161.290323  129.032258
1   2021      7    2  cars  161.290323  129.032258
2   2021      7    3  cars  161.290323  129.032258
3   2021      7    4  cars  161.290323  129.032258
4   2021      7    5  cars  161.290323  129.032258
..   ...    ...  ...   ...         ...         ...
87  2021      9   26  cars  100.000000  233.333333
88  2021      9   27  cars  100.000000  233.333333
89  2021      9   28  cars  100.000000  233.333333
90  2021      9   29  cars  100.000000  233.333333
91  2021      9   30  cars  100.000000  233.333333

for that you need to create a temp dataframe that will include the days in each month, then merge it, then divide the values

let's assume that you have data single year, so we can create the date range from it straight away, and create the temp dataframe:

dt_range = pd.DatFrame(pd.date_range(df.loc[0,'year'] + '-01-01', periods=365))
dt_range.columns = ['dte']
dt_range['year'] = dt_range['dte'].dt.year
dt_range['month'] = dt_range['dte'].dt.month
dt_range['day'] = dt_range['dte'].dt.day

now we can create the new dataframe:

new_df = pd.merge(df, dt_range,how='left',on=['year','month'])

now all we have to do is group by and merge, and we have what you needed

new_df = new_df.groupby(['year','month','day']).agg({'value':'mean'})

You can use resample to upsample months into days:

import pandas as pd

df = pd.DataFrame([[2021,7,5000]], columns=['year', 'month', 'value'])

# create datetime column as period
df['datetime'] = pd.to_datetime(df['month'].astype(str) + '/' + df['year'].astype(str)).dt.to_period("M")

# calculate values per day by dividing the value by number of days per month
df['ndays'] = df['datetime'].apply(lambda x: x.days_in_month)
df['value'] = df['value'] / df['ndays']

# set datetime as index and resample:
df = df[['value', 'datetime']].set_index('datetime')
df = df.resample('d').ffill().reset_index()

#split datetime to separate columns
df['day'] = df['datetime'].dt.day
df['month'] = df['datetime'].dt.month
df['year'] = df['datetime'].dt.year
df.drop(columns=['datetime'], inplace=True)
value day month year
0 161.29 1 7 2021
1 161.29 2 7 2021
2 161.29 3 7 2021
3 161.29 4 7 2021
4 161.29 5 7 2021

I assume dataframe can have more months, for example extending a little Your initial dataframe:

df = pd.read_csv(StringIO("""
year    month   value
2021    7   5000
2021    8   5000
2021    9   5000
"""), sep = "\t")

Which gives dataframe df :

   year  month  value
0  2021      7   5000
1  2021      8   5000
2  2021      9   5000

Solution is simple one-liner: first datetime index is created from raw year and month , then resample method is used to convert months to days, finally value is overwritten by calculating average per day in every month:

df_out = (
     df.set_index(pd.DatetimeIndex(pd.to_datetime(dict(year=df.year, month=df.month, day=1)), freq="MS"))
     .resample('D')
     .ffill()
     .assign(value = lambda df: df.value/df.index.days_in_month)
     )

Resulting dataframe:

            year  month       value
2021-07-01  2021      7  161.290323
2021-07-02  2021      7  161.290323
2021-07-03  2021      7  161.290323
2021-07-04  2021      7  161.290323
2021-07-05  2021      7  161.290323
         ...    ...         ...
2021-08-28  2021      8  161.290323
2021-08-29  2021      8  161.290323
2021-08-30  2021      8  161.290323
2021-08-31  2021      8  161.290323
2021-09-01  2021      9  166.666667

Please note September has only 30 days, so value is different than in previous months.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM