I am trying to convert test scores from a psychological questionnaire in the first data set to standardized scores (range of percentiles) in another data set
The test scores are one score from 9 people that took my questionnaire
TestResults <- data.frame(ID = c(1:9),
Observed = c(14, 8, 33, 23, 5, 79, 2, 11, 5), Results = NA)
The Scoring sheet, from the test publisher, implemented manually in R, shortened here for simplicity
ScoringSheet <- data.frame(Percentiles = c(99,95,85,55,10), Score = c(79,33,20,15,5))
I would like to fill the column Results
, with the corresponding percentile values for the observed scores from the ScoringSheet
. For the scoring, a simple algorithm applies, which I just can't get implemented in R
1 if TestResults$Observed %in% ScoringSheet$Score
, then Results
should be the corresponding Percentiles
value in the ScoringSheet
.
2 if !(TestResults$Observed %in% ScoringSheet$Score)
, then TestResults$Results
should be the average of the two ScoringSheet$Percentiles
between which the Observed
score falls
3 if TestResults$Observed < min(ScoringSheet$Score)
then the Results
value for these smallest observed value should be min(ScoringSheet$Percentiles)/2
As a result, I would need this
TestResults <- data.frame(ID = c(1:9),
Observed = c(14, 8, 33, 23, 5, 79, 2, 11, 5),
Results = c(0.5,0.5,95,90,0.5,99,0.5,0.5,0.5))
Until now, I can get the corresponding percentiles for criterion 1 using merge()
on TestResults$Observed
and ScoringSheet$Score
, creating NAs for the values that are not exactly matching. I am now wondering how to implement criterion 2 and 3.
Thank you in advance!
Probably not the nicest solution, but it does the job. First we sort the ScoringSheet
, then we use match
to find exact matches. Finally we loop over all the records where no exact match was found, and apply your calculation there. I added a rule for when the score is higher than the 99th percentile score, in which it becomes equal to the highest percentile score. I also added two more entries to show that the code below works properly.
TestResults <- data.frame(ID = c(1:11),
Observed = c(14, 8, 33, 23, 5, 79, 2, 11, 5,100,55), Results = NA)
ScoringSheet <- data.frame(Percentiles = c(99,95,85,55,10), Score = c(79,33,20,15,5))
ScoringSheet = ScoringSheet[order(ScoringSheet$Score,decreasing = F),]
TestResults$Results = ScoringSheet$Percentiles[match(TestResults$Observed,ScoringSheet$Score)]
for(i in which(is.na(TestResults$Results)))
{
x = tail(which((TestResults$Observed[i]>ScoringSheet$Score)),1)
if(!length(x)==0)
{
TestResults$Results[i] = mean(ScoringSheet$Percentiles[c(x,min(x+1,nrow(ScoringSheet)))])
}
else
{
TestResults$Results[i] = ScoringSheet$Percentiles[1]/2
}
}
Output:
ID Observed Results
1 1 14 32.5
2 2 8 32.5
3 3 33 95.0
4 4 23 90.0
5 5 5 10.0
6 6 79 99.0
7 7 2 5.0
8 8 11 32.5
9 9 5 10.0
10 10 100 99.0
11 11 55 97.0
Hope this helps!
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.