I have formed the bins using pandas.cut function. Now, in order to perform smoothing by bin-boundaries, I calculate the minimum and maximum value of each bin using groupby function
Minimum values
date births with noise
bin
A 1959-01-31 23 19.921049
B 1959-01-02 27 25.921175
C 1959-01-01 30 32.064698
D 1959-01-08 35 38.507170
E 1959-01-05 41 45.022163
F 1959-01-13 47 51.821755
G 1959-03-27 56 59.416700
H 1959-09-23 73 70.140119
Maximum values-
date births with noise
bin
A 1959-07-12 30 25.161292
B 1959-12-11 35 31.738422
C 1959-12-27 42 38.447807
D 1959-12-20 48 44.919703
E 1959-12-31 56 51.274550
F 1959-12-30 59 57.515927
G 1959-11-05 68 63.970382
H 1959-09-23 73 70.140119
Now I want to replace the values in my original dataframe. If the value is less than the mean (of its bin) then it is replaced with the min value (for that bin), and if it is greater than the mean then it is replaced with the max value.
My dataframe looks like this-
date births with noise bin smooth_val_mean
0 1959-01-01 35 36.964692 C 35.461173
1 1959-01-02 32 29.861393 B 29.592061
2 1959-01-03 30 27.268515 B 29.592061
3 1959-01-04 31 31.513148 B 29.592061
4 1959-01-05 44 46.194690 E 47.850101
How should I do this using pandas/numpy?
Let's try this function:
def thresh(col):
means = df['bin'].replace(df_mean[col])
mins = df['bin'].replace(df_min[col])
maxs = df['bin'].replace(df_max[col])
signs = np.signs(df[col] - means)
df[f'{col}_smooth'] = np.select((signs==1, signs==-1), (maxs, mins), means)
for col in ['with noise']:
thresh(col)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.