简体   繁体   中英

Column mean, variance and boolean mutation with multiple conditions in R

I've got a title-day panel dataset in long formatted. In this reproduction there are three persons who gave a score (1, 2 and 3). For each person the score itself, whether the score is given for that title and day (boolean) and the day on which the score is coded. This last one stays as only variable constant within the title for that person. When the person did not give any score this is indicated with NA. See here in df1:

title <- c("x","x","x","x","y","y","y","y","z","z","z","z")
day <- c(0,1,2,3,0,1,2,3,0,1,2,3)
avg_score <- c(0,0,0,0,0,0,0,0,0,0,0,0)
variance <- c(0,0,0,0,0,0,0,0,0,0,0,0)
score_or_not <- c(0,0,0,0,0,0,0,0,0,0,0,0)
score_1 <- c(0,0,0,30,NA,NA,NA,NA,0,0,0,50)
score_or_not1 <- c(0,0,0,1,NA,NA,NA,NA,0,0,0,1)
score_day1 <- c(3,3,3,3,NA,NA,NA,NA,3,3,3,3)
score_2 <- c(NA,NA,NA,NA,0,80,80,80,0,0,80,80)
score_or_not2 <- c(NA,NA,NA,NA,0,1,1,1,0,0,1,1)
score_day2 <- c(NA,NA,NA,NA,1,1,1,1,2,2,2,2)
score_3 <- c(0,0,0,0,NA,NA,NA,NA,90,90,90,90)
score_or_not3 <- c(0,0,0,0,NA,NA,NA,NA,1,1,1,1)
score_day3 <- c(-2,-2,-2,-2,NA,NA,NA,NA,0,0,0,0)

df1 <- data.frame(title,day,avg_score,variance,score_or_not,score_1,score_or_not1,score_day1,score_2,score_or_not2,score_day2,score_3,score_or_not3,score_day3)

I'm stuck with the following problem. I need three new columns (avg_score, variance and score_or_not) which are based on these given scores. However, there are some conditions, namely, when score_day is negative or zero the score should not be taken into account for the new columns and should, like the NA columns, be ignored. It is important that the NA values stays NA and that the negative or 0 values also stay the same.

Here a description of the three new variables: 1. The avg_score should become the average score of all the scores that are given, only when they fullfill the condition. When there is just one score, that score should be the value of avg_score. 2. Variance should be 0 when there is no or just one score available. When there are 2 or more the variance should be calculated in this column. 3. Score_or_not should be a boolean where we see whether, on that day, a score is available. Of course also taken the conditions into account.

The result should look like this:

title <- c("x","x","x","x","y","y","y","y","z","z","z","z")
day <- c(0,1,2,3,0,1,2,3,0,1,2,3)
avg_score <- c(0,0,0,30,0,80,80,80,0,0,80,65)
variance <- c(0,0,0,0,0,0,0,0,0,0,0,450)
score_or_not <- c(0,0,0,1,0,1,1,1,0,0,1,1)
score_1 <- c(0,0,0,30,NA,NA,NA,NA,0,0,0,50)
score_or_not1 <- c(0,0,0,1,NA,NA,NA,NA,0,0,0,1)
score_day1 <- c(3,3,3,3,NA,NA,NA,NA,3,3,3,3)
score_2 <- c(NA,NA,NA,NA,0,80,80,80,0,0,80,80)
score_or_not2 <- c(NA,NA,NA,NA,0,1,1,1,0,0,1,1)
score_day2 <- c(NA,NA,NA,NA,1,1,1,1,2,2,2,2)
score_3 <- c(0,0,0,0,NA,NA,NA,NA,90,90,90,90)
score_or_not3 <- c(0,0,0,0,NA,NA,NA,NA,1,1,1,1)
score_day3 <- c(-2,-2,-2,-2,NA,NA,NA,NA,0,0,0,0)

Output <- data.frame(title,day,avg_score,variance,score_or_not,score_1,score_or_not1,score_day1,score_2,score_or_not2,score_day2,score_3,score_or_not3,score_day3)

Hope you guys can fix this specific problem..

Probably easiest to reshape and then do the calculations for all 3 persons with filtering for your conditions, then join back to the original data frame.

library(dplyr)
library(tidyr)

left_join(df1,
          pivot_longer(df1, cols=-c(title, day),
                       names_to=c(".value","person"),
                       names_pattern="(.*)(\\d)") %>%
            filter(score_day>0 & score_or_not==1) %>%
            group_by(title, day) %>%
            summarise(avg_score=mean(score_, na.rm=TRUE),
                      variance=var(score_, na.rm=TRUE),
                      score_or_not=+(avg_score>0)),
          by=c('title','day')) %>%
  mutate(avg_score=replace_na(avg_score,0), 
         variance=replace_na(variance, 0), 
         score_or_not=replace_na(score_or_not, 0))

Result:

...

   avg_score variance score_or_not
1          0        0            0
2          0        0            0
3          0        0            0
4         30        0            1
5          0        0            0
6         80        0            1
7         80        0            1
8         80        0            1
9          0        0            0
10         0        0            0
11        80        0            1
12        65      450            1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM