简体   繁体   中英

Smoothing by bin boundaries using pandas/numpy

I have formed the bins using pandas.cut function. Now, in order to perform smoothing by bin-boundaries, I calculate the minimum and maximum value of each bin using groupby function
Minimum values

    date    births  with noise
bin         
A   1959-01-31  23  19.921049
B   1959-01-02  27  25.921175
C   1959-01-01  30  32.064698
D   1959-01-08  35  38.507170
E   1959-01-05  41  45.022163
F   1959-01-13  47  51.821755
G   1959-03-27  56  59.416700
H   1959-09-23  73  70.140119

Maximum values-

    date    births  with noise
bin         
A   1959-07-12  30  25.161292
B   1959-12-11  35  31.738422
C   1959-12-27  42  38.447807
D   1959-12-20  48  44.919703
E   1959-12-31  56  51.274550
F   1959-12-30  59  57.515927
G   1959-11-05  68  63.970382
H   1959-09-23  73  70.140119

Now I want to replace the values in my original dataframe. If the value is less than the mean (of its bin) then it is replaced with the min value (for that bin), and if it is greater than the mean then it is replaced with the max value.
My dataframe looks like this-

    date    births  with noise  bin smooth_val_mean
0   1959-01-01  35  36.964692   C   35.461173
1   1959-01-02  32  29.861393   B   29.592061
2   1959-01-03  30  27.268515   B   29.592061
3   1959-01-04  31  31.513148   B   29.592061
4   1959-01-05  44  46.194690   E   47.850101

How should I do this using pandas/numpy?

Let's try this function:

def thresh(col):
    means = df['bin'].replace(df_mean[col])
    mins = df['bin'].replace(df_min[col])
    maxs = df['bin'].replace(df_max[col])
    
    signs = np.signs(df[col] - means)
    
    df[f'{col}_smooth'] = np.select((signs==1, signs==-1), (maxs, mins), means)

for col in ['with noise']:
    thresh(col)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM