简体   繁体   中英

R: replacing a certain value in a data-frame with the mean of particular range of data

Despite going through many topics and comments on Stackoverflow, I could not find the answer for this question. Hope it gets answered here. Thanks.

Let us say we have a data frame like this:

> my_data

   year month day           pr         max        min
1  2081     1   1 5.569092e-04 -26.4920749 -24.483246
2  2081     1   2 1.777802e-04 -25.9205721 -20.451972
3  2081     1   3 1.323720e-03 -27.1527253 -10.395930
4  2081     1   4 2.137142e-03 -20.8107204  -9.002432
5  2081     1   5 7.505645e-04 -16.2825782  -8.997454
6  2081     1   6 2.812341e-03  -8.4525805  -8.973068
7  2081     1   7 3.484746e-03  -0.3836075 -16.055945
8  2081     1   8 4.613059e-04  -0.2103037 -25.410168
9  2081     1   9 7.486442e-04  -3.7030182 -27.551599
10 2081     1  10 3.175442e-03  -1.5308754 -27.882620
11 2081     1  11 2.399104e-03  -5.9834657 -24.491168
12 2081     1  12 1.833119e-03 -13.7263921 -22.745234
13 2081     1  13 3.315489e-04 -18.3818204 -22.640128
14 2081     1  14 1.180063e-04 -18.6430621 -17.468890
15 2081     1  15 2.493895e-05 -15.5209717 -16.186150
16 2081     1  16 6.260483e-05 -15.8603685 -11.547591
17 2081     1  17 2.280691e-04 -10.4212179  -7.414533
18 2081     1  18 5.984287e-04  -7.9375899  -7.400936
19 2081     1  19 7.078201e-04  -7.3717562 -13.400183
20 2081     1  20 2.017283e-03  -5.7114717 -17.213636
21 2081     1  21 9.261695e-04  -4.0757166 -18.174468
22 2081     1  22 1.107990e-03  -4.7120487 -18.968903
23 2081     1  23 1.698175e-03  -7.0420167 -17.654700
24 2081     1  24 1.468677e-03 -11.8686058 -12.688654
25 2081     1  25 5.597740e-04 -11.5570338  -9.391358
26 2081     1  26 2.446489e-04 -10.5752366  -8.349224
27 2081     1  27 1.485243e-04  -8.6466939  -7.059217
28 2081     1  28 4.694722e-04  -6.1383411 -12.353198
29 2081     1  29 3.802654e-04  -2.1109669 -15.652165
30 2081     1  30 9.396260e-04  -0.1226451 -19.592908
31 2081     1  31 2.871977e-03  -0.7997992 -22.973038

I want to make a new value for max in each row, where max is less than and equal to min .

I am using this code:

for (i in 1:nrow(my_data)) {
   # Replace the max with the Mean of max in row i to i+10:
   my_data$max[my_data$max <= my_data$min] <- mean(my_data$max[i:(i+10)])  

}

But the answer would be like:

> head(my_data)

  year month day           pr        max        min
1 2081     1   1 0.0005569092 -12.447502 -24.483246
2 2081     1   2 0.0001777802 -12.447502 -20.451972
3 2081     1   3 0.0013237197  -7.616386 -10.395930
4 2081     1   4 0.0021371418  -7.616386  -9.002432
5 2081     1   5 0.0007505645  -7.616386  -8.997454
6 2081     1   6 0.0028123409  -8.452581  -8.973068

Which does NOT make any sense.

It must be like:

  year month day           pr        max        min
1 2081     1   1 0.0005569092   -12.447502  -24.483246
2 2081     1   2 0.0001777802   -11.286985  -20.451972 
3 2081     1   3 0.0013237197   -10.601644  -10.395930
4 2081     1   4 0.0021371418    -9.8280386  -9.002432
5 2081     1   5 0.0007505645    -9.3471523  -8.997454
6 2081     1   6 0.0028123409    -9.3087696  -8.973068

Your comments or answers would be highly appreciated.

EDIT:

If we use Excel, this would be the whole desired result:

   year month day          pr         max        min
1  2081     1   1 0.000556909 -12.4475020 -24.483246
2  2081     1   2 0.000177780 -11.2869854 -20.451972
3  2081     1   3 0.001323720 -10.6016443 -10.395930
4  2081     1   4 0.002137142  -9.8280386  -9.002432
5  2081     1   5 0.000750565  -9.3471523  -8.997454
6  2081     1   6 0.002812341  -8.4525805  -8.973068
7  2081     1   7 0.003484746  -0.3836075 -16.055945
8  2081     1   8 0.000461306  -0.2103037 -25.410168
9  2081     1   9 0.000748644  -3.7030182 -27.551599
10 2081     1  10 0.003175442  -1.5308754 -27.882620
11 2081     1  11 0.002399104  -5.9834657 -24.491168
12 2081     1  12 0.001833119 -13.7263921 -22.745234
13 2081     1  13 0.000331549 -18.3818204 -22.640128
14 2081     1  14 0.000118006  -9.9240751 -17.468890
15 2081     1  15 0.000024900 -15.5209717 -16.186150
16 2081     1  16 0.000062600  -8.8302784 -11.547591
17 2081     1  17 0.000228069  -8.1744898  -7.414533
18 2081     1  18 0.000598429  -7.7851374  -7.400936
19 2081     1  19 0.000707820  -7.3717562 -13.400183
20 2081     1  20 0.002017283  -5.7114717 -17.213636
21 2081     1  21 0.000926170  -4.0757166 -18.174468
22 2081     1  22 0.001107990  -4.7120487 -18.968903
23 2081     1  23 0.001698175  -7.0420167 -17.654700
24 2081     1  24 0.001468677 -11.8686058 -12.688654
25 2081     1  25 0.000559774  -5.7072452  -9.391358
26 2081     1  26 0.000244649  -4.7322805  -8.349224
27 2081     1  27 0.000148524  -3.5636892  -7.059217
28 2081     1  28 0.000469472  -6.1383411 -12.353198
29 2081     1  29 0.000380265  -2.1109669 -15.652165
30 2081     1  30 0.000939626  -0.1226451 -19.592908
31 2081     1  31 0.002871977  -0.7997992 -22.973038

Just loop over the sequence and assign

v1 <- sapply(seq_len(nrow(my_data)), function(i) 
             mean(my_data$max[i:(min(nrow(my_data), (i+10)))]))
i1 <- with(my_data, max <= min)
my_data$max[i1] <- v1[i1]
# year month day           pr         max        min
#1  2081     1   1 5.569092e-04 -12.4475020 -24.483246
#2  2081     1   2 1.777802e-04 -11.2869854 -20.451972
#3  2081     1   3 1.323720e-03 -10.6016443 -10.395930
#4  2081     1   4 2.137142e-03  -9.8280386  -9.002432
#5  2081     1   5 7.505645e-04  -9.3471523  -8.997454
#6  2081     1   6 2.812341e-03  -8.4525805  -8.973068
#7  2081     1   7 3.484746e-03  -0.3836075 -16.055945
#8  2081     1   8 4.613059e-04  -0.2103037 -25.410168
#9  2081     1   9 7.486442e-04  -3.7030182 -27.551599
#10 2081     1  10 3.175442e-03  -1.5308754 -27.882620
#11 2081     1  11 2.399104e-03  -5.9834657 -24.491168
#12 2081     1  12 1.833119e-03 -13.7263921 -22.745234
#13 2081     1  13 3.315489e-04 -18.3818204 -22.640128
#14 2081     1  14 1.180063e-04  -9.9240751 -17.468890
#15 2081     1  15 2.493895e-05 -15.5209717 -16.186150
#16 2081     1  16 6.260483e-05  -8.8302784 -11.547591
#17 2081     1  17 2.280691e-04  -8.1744898  -7.414533
#18 2081     1  18 5.984287e-04  -7.7851374  -7.400936
#19 2081     1  19 7.078201e-04  -7.3717562 -13.400183
#20 2081     1  20 2.017283e-03  -5.7114717 -17.213636
#21 2081     1  21 9.261695e-04  -4.0757166 -18.174468
#22 2081     1  22 1.107990e-03  -4.7120487 -18.968903
#23 2081     1  23 1.698175e-03  -7.0420167 -17.654700
#24 2081     1  24 1.468677e-03 -11.8686058 -12.688654
#25 2081     1  25 5.597740e-04  -5.7072452  -9.391358
#26 2081     1  26 2.446489e-04  -4.7322805  -8.349224
#27 2081     1  27 1.485243e-04  -3.5636892  -7.059217
#28 2081     1  28 4.694722e-04  -6.1383411 -12.353198
#29 2081     1  29 3.802654e-04  -2.1109669 -15.652165
#30 2081     1  30 9.396260e-04  -0.1226451 -19.592908
#31 2081     1  31 2.871977e-03  -0.7997992 -22.973038

can you please post the output of dput(my_data) please?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM