简体   繁体   中英

Update values in a dataframe at every nth value with a group by

I have dataframe for which i would like to populate with some values in 'snail_chart_dates' without using a loop.

The values I would like are the first 'value_date' and the last 'value_date' and the middle 'value_date' grouped by code_id

I think this could be done maybe with a combination of .iloc[0] and .iloc[-1] but I have no idea how to apply this to a groupby or how to get the middle

Example trimmed data frame - other one is much longer

在此处输入图像描述

    import pandas as pd
    import numpy as np 

    returns = pd.DataFrame({
            'value_date' : ['2018-01-31', '2018-02-28', '2018-03-31','2018-04-30', '2018-05-31', '2018-06-30', 
                            '2018-01-31', '2018-02-28', '2018-03-31','2018-04-30', '2018-05-31', '2018-06-30'],
            'code_id' :  ['AUD','AUD','AUD','AUD','AUD','AUD', 
                          'USD','USD','USD','USD','USD','USD'],
            'gross_return': [.01, .02, .03, -.4, -.06, -.02, 
                             .06, .8, .9, .4, -1.06, .03],
            'bm_return': [.01, .02, .03, -.4, -.06, -.02, 
                             .06, .8, .9, .4, -1.06, .03],
            })


    returns["snail_chart_dates"] = ""

example desired result - without middle

在此处输入图像描述

First we get the min and max date per group.

Then we get the indices of these values per group with idxmin and idxmax .

To get the middle values we get the median of the index per group and round up with np.ceil

And finally we assign these values to our new column with loc :

grp = returns.groupby('code_id')
s1 = grp['value_date'].transform('min')
s2 = grp['value_date'].transform('max')
s3 = grp.apply(lambda x: np.ceil(np.median(x.index))).values

idx_min = grp['value_date'].idxmin().values
idx_max = grp['value_date'].idxmax().values

returns.loc[idx_min, 'snail_chart_dates'] = s1.loc[idx_min]
returns.loc[idx_max, 'snail_chart_dates'] = s2.loc[idx_max]
returns.loc[s3, 'snail_chart_dates'] = returns.loc[s3, 'value_date']

   value_date code_id  gross_return  bm_return snail_chart_dates
0  2018-01-31     AUD          0.01       0.01        2018-01-31
1  2018-02-28     AUD          0.02       0.02               NaT
2  2018-03-31     AUD          0.03       0.03               NaT
3  2018-04-30     AUD         -0.40      -0.40        2018-04-30
4  2018-05-31     AUD         -0.06      -0.06               NaT
5  2018-06-30     AUD         -0.02      -0.02        2018-06-30
6  2018-01-31     USD          0.06       0.06        2018-01-31
7  2018-02-28     USD          0.80       0.80               NaT
8  2018-03-31     USD          0.90       0.90               NaT
9  2018-04-30     USD          0.40       0.40        2018-04-30
10 2018-05-31     USD         -1.06      -1.06               NaT
11 2018-06-30     USD          0.03       0.03        2018-06-30

Assuming you put value_date into the Pandas datetime format, there is a very simple solution.

We can group by the code, and use the .quantile() function to get our dates.

returns["value_date"] = pd.to_datetime(returns["value_date"])
code_groups = returns.groupby("code_id")["value_date"]
code_groups.quantile(0, interpolation="nearest")
code_id
AUD   2018-01-31
USD   2018-01-31
Name: value_date, dtype: datetime64[ns]
code_groups.quantile(0.5, interpolation="nearest")
code_id
AUD   2018-03-31
USD   2018-03-31
Name: value_date, dtype: datetime64[ns]

You can then assign this information as you wish into your table.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM