简体   繁体   中英

R: assign value based on criteria

I am trying to convert test scores from a psychological questionnaire in the first data set to standardized scores (range of percentiles) in another data set

The test scores are one score from 9 people that took my questionnaire

TestResults <- data.frame(ID = c(1:9),   
               Observed = c(14, 8, 33, 23, 5, 79, 2, 11, 5), Results = NA) 

The Scoring sheet, from the test publisher, implemented manually in R, shortened here for simplicity

ScoringSheet <- data.frame(Percentiles = c(99,95,85,55,10), Score = c(79,33,20,15,5))

I would like to fill the column Results , with the corresponding percentile values for the observed scores from the ScoringSheet . For the scoring, a simple algorithm applies, which I just can't get implemented in R

1 if TestResults$Observed %in% ScoringSheet$Score , then Results should be the corresponding Percentiles value in the ScoringSheet .
2 if !(TestResults$Observed %in% ScoringSheet$Score) , then TestResults$Results should be the average of the two ScoringSheet$Percentiles between which the Observed score falls
3 if TestResults$Observed < min(ScoringSheet$Score) then the Results value for these smallest observed value should be min(ScoringSheet$Percentiles)/2

As a result, I would need this

TestResults <- data.frame(ID = c(1:9), 
                           Observed = c(14, 8, 33, 23, 5, 79, 2, 11, 5), 
                           Results = c(0.5,0.5,95,90,0.5,99,0.5,0.5,0.5))

Until now, I can get the corresponding percentiles for criterion 1 using merge() on TestResults$Observed and ScoringSheet$Score , creating NAs for the values that are not exactly matching. I am now wondering how to implement criterion 2 and 3.

Thank you in advance!

Probably not the nicest solution, but it does the job. First we sort the ScoringSheet , then we use match to find exact matches. Finally we loop over all the records where no exact match was found, and apply your calculation there. I added a rule for when the score is higher than the 99th percentile score, in which it becomes equal to the highest percentile score. I also added two more entries to show that the code below works properly.

TestResults <- data.frame(ID = c(1:11),   
                          Observed = c(14, 8, 33, 23, 5, 79, 2, 11, 5,100,55), Results = NA) 

ScoringSheet <- data.frame(Percentiles = c(99,95,85,55,10), Score = c(79,33,20,15,5))

ScoringSheet = ScoringSheet[order(ScoringSheet$Score,decreasing = F),]
TestResults$Results = ScoringSheet$Percentiles[match(TestResults$Observed,ScoringSheet$Score)]
for(i in which(is.na(TestResults$Results)))
{
  x = tail(which((TestResults$Observed[i]>ScoringSheet$Score)),1)
  if(!length(x)==0)
  {
    TestResults$Results[i] = mean(ScoringSheet$Percentiles[c(x,min(x+1,nrow(ScoringSheet)))])
  }
  else
  {
    TestResults$Results[i] = ScoringSheet$Percentiles[1]/2
  }
}

Output:

   ID Observed Results
1   1       14    32.5
2   2        8    32.5
3   3       33    95.0
4   4       23    90.0
5   5        5    10.0
6   6       79    99.0
7   7        2     5.0
8   8       11    32.5
9   9        5    10.0
10 10      100    99.0
11 11       55    97.0

Hope this helps!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM