简体   繁体   中英

Average of different subsets of variables with condition in R

Sample data:

    
ID  month1  month2   month3   month4   month5  month6  month7  month8  month9  month10   b1  b2
-------------------------------------------------------------------------------------------------------
1   12      14        15         45      12      12       11    12       78     28      3   9
2   14      15        45         14      15      45       14    19       22     27      4   8
3   14      13        25         74      25      45       14    19       22     27      5   10
.
.
.
.
70.....    ....                           .....                                    ......1   8

I want to calculate the average of the "month"-variables (for each ID) based on the difference between b 1 (interview 1 month) and b 2 (interview 2 month). So the averages will be row wise

for example, for ID=1, who was first interviewed in month 3 and then again in month 9 , the average will be (month3 + month4 + month5 + month6 + month7 + month8 month9)/7, which is (15 + 45 + 12 + 12 + 11 + 12 + 78)/7=26.42

and

for ID= 2, average will be (month4 + month5 +month6+ month7 +month8)/5

and so on..

I am working on R-studio. So, I will prefer a code written in that. Thanks in advance!!

This solution works provided that the order of your variables is not changing.

library(dplyr)

df %>%
  rowwise() %>%
  mutate(avg = mean(c_across((b1+1):(b2+1)), na.rm =TRUE)) %>%
  select(-ID)

# Rowwise: 
  month1 month2 month3 month4 month5 month6 month7 month8 month9 month10    b1    b2   avg
   <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>   <dbl> <dbl> <dbl> <dbl>
1     12     14     15     45     12     12     11     12     78      28     3     9  26.4
2     14     15     45     14     15     45     14     19     22      27     4     8  24.9
3     14     13     25     74     25     45     14     19     22      27     5    10  32 

Sample data:

df <- tribble(
  ~ID,  ~month1,  ~month2,   ~month3,   ~month4,   ~month5,  ~month6,  ~month7,  ~month8,  ~month9,  ~month10,   ~b1,  ~b2,
    1,   12,      14,        15,         45,      12,      12,       11,    12,       78,     28,      3,   9,
  2,   14,      15,        45,         14,      15,      45,       14,    19,       22,     27,      4,   8,
  3,   14,      13,        25,         74,      25,      45,       14,    19,       22,     27,      5,   10,
)

Base R option using mapply :

cols <- grep('month', names(df), value = TRUE)
df$result <- mapply(function(x, y, z) mean(unlist(df[x,cols[y:z]]),na.rm = TRUE),
                     seq(nrow(df)), df$b1, df$b2)

You can use apply to go by row, subset the vector and calculate the mean:

apply(df[-1], 1, function(x) mean(as.numeric(x[x[11]:x[12]])))
#[1] 26.42857 21.40000 25.33333

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM