简体   繁体   中英

R moving average

As an example I use the Boston data with 3 columns (id (added), medv, lstat) and 506 observations.

I want to calculate a moving average for k-1 observations for the variable medv. This means that the mean value should be calculated over all observations except a certain row. For id 1, the mean value is calculated from line 2-506. For id 2, the mean value is calculated over line 1 + 3-506. For id 3, the mean value is calculated over the lines 1-2 + 4-506 and so on.

In a second step the calculation of the mean value should be conditional, eg above the median and below the median in two different columns. This means that we first check whether a value within each column (medv and lstat) is above or below the median. If the value in medv is above the median, we calculate the mean value of lstat from the values that are above the median in lstat. If the value in medv is below the median, we calculate the mean value of lstat from the values that are below the median. See example table below for the first 10 rows. The median for the first 10 rows is 25.55 for medv and 7.24 for lstat.

Here is the data:

library(mlbench)
data(BostonHousing)
df <- BostonHousing
df$id <- seq.int(nrow(df))
df <- subset(df, select = c(id, medv, lstat))
id medv lstat mean1out meancond
 1 24.0  4.98 26.66667     4.50
 2 21.6  9.14 26.93333     4.50
 3 34.7  4.03 25.47778    17.55
 4 33.4  2.94 25.62222    17.55
 5 36.2  5.33 25.31111    17.55
 6 28.7  5.21 26.14444    17.55
 7 22.9 12.43 26.78889     4.50
 8 27.1 19.15 26.32222    17.55
 9 16.5 29.93 27.50000     4.50
10 18.9 17.10 27.23333     4.50
mean(dat$medv[-3])
# [1] 25.47778

sapply(seq_len(nrow(dat)), function(i) mean(dat$medv[-i]))
#  [1] 26.66667 26.93333 25.47778 25.62222 25.31111 26.14444 26.78889 26.32222 27.50000 27.23333

Alternatively (mathematically), without the sapply , you can get the same numbers this way:

n <- nrow(dat)
(mean(dat$medv)*n - dat$medv)/(n-1)
#  [1] 26.66667 26.93333 25.47778 25.62222 25.31111 26.14444 26.78889 26.32222 27.50000 27.23333

For your conditional mean, a simple ifelse works:

n <- nrow(dat)
transform(
  dat,
  a = (mean(dat$medv)*n - dat$medv)/(n-1),
  b = ifelse(medv <= median(medv),
             mean(lstat[ lstat <= median(lstat) ]),
             mean(lstat[ lstat > median(lstat) ]))
)
#    id medv lstat mean1out meancond        a      b
# 1   1 24.0  4.98 26.66667     4.50 26.66667  4.498
# 2   2 21.6  9.14 26.93333     4.50 26.93333  4.498
# 3   3 34.7  4.03 25.47778    17.55 25.47778 17.550
# 4   4 33.4  2.94 25.62222    17.55 25.62222 17.550
# 5   5 36.2  5.33 25.31111    17.55 25.31111 17.550
# 6   6 28.7  5.21 26.14444    17.55 26.14444 17.550
# 7   7 22.9 12.43 26.78889     4.50 26.78889  4.498
# 8   8 27.1 19.15 26.32222    17.55 26.32222 17.550
# 9   9 16.5 29.93 27.50000     4.50 27.50000  4.498
# 10 10 18.9 17.10 27.23333     4.50 27.23333  4.498

(I'm inferring that the differences are rounding errors on data entry.)

The first part of the problem is already solved by @r2evans.

For the second part we can calculate median of lstat and medv , compare and assign values.

#First part from @r2evans answer. 
n <- nrow(df)
df$mean1out <- (mean(df$medv)*n - df$medv)/(n-1)


#Second part
med_lsat <- median(df$lstat)
med_medv <- median(df$medv)
higher_lsat <- mean(df$lstat[df$lstat > med_lsat])
lower_lsat <- mean(df$lstat[df$lstat < med_lsat])
df$meancond <- ifelse(df$medv > med_medv, higher_lsat, lower_lsat)
df

#   id medv lstat mean1out meancond
#1   1 24.0  4.98 26.66667    4.498
#2   2 21.6  9.14 26.93333    4.498
#3   3 34.7  4.03 25.47778   17.550
#4   4 33.4  2.94 25.62222   17.550
#5   5 36.2  5.33 25.31111   17.550
#6   6 28.7  5.21 26.14444   17.550
#7   7 22.9 12.43 26.78889    4.498
#8   8 27.1 19.15 26.32222   17.550
#9   9 16.5 29.93 27.50000    4.498
#10 10 18.9 17.10 27.23333    4.498

data

df <- BostonHousing
df$id <- seq.int(nrow(df))
df <- subset(df, select = c(id, medv, lstat))
df <- head(df, 10)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM