[英]R: assign value based on criteria
I am trying to convert test scores from a psychological questionnaire in the first data set to standardized scores (range of percentiles) in another data set 我正在尝试将第一个数据集中的心理调查问卷的测试分数转换为另一个数据集中的标准化分数(百分位数的范围)
The test scores are one score from 9 people that took my questionnaire 测试成绩是参加我问卷调查的9个人中的一项
TestResults <- data.frame(ID = c(1:9),
Observed = c(14, 8, 33, 23, 5, 79, 2, 11, 5), Results = NA)
The Scoring sheet, from the test publisher, implemented manually in R, shortened here for simplicity 来自测试发布者的评分表,在R中手动实现,此处为简化起见在此处缩短
ScoringSheet <- data.frame(Percentiles = c(99,95,85,55,10), Score = c(79,33,20,15,5))
I would like to fill the column Results
, with the corresponding percentile values for the observed scores from the ScoringSheet
. 我想用从
ScoringSheet
观察到的分数的相应百分数值填充“ Results
”列。 For the scoring, a simple algorithm applies, which I just can't get implemented in R 对于计分,应用了一种简单的算法,但我无法在R中实现
1 if TestResults$Observed %in% ScoringSheet$Score
, then Results
should be the corresponding Percentiles
value in the ScoringSheet
. 1如果
TestResults$Observed %in% ScoringSheet$Score
,那么Results
应该是对应Percentiles
在值ScoringSheet
。
2 if !(TestResults$Observed %in% ScoringSheet$Score)
, then TestResults$Results
should be the average of the two ScoringSheet$Percentiles
between which the Observed
score falls 2,如果
!(TestResults$Observed %in% ScoringSheet$Score)
然后TestResults$Results
应该是两者的平均ScoringSheet$Percentiles
之间,其中Observed
分数瀑布
3 if TestResults$Observed < min(ScoringSheet$Score)
then the Results
value for these smallest observed value should be min(ScoringSheet$Percentiles)/2
3如果
TestResults$Observed < min(ScoringSheet$Score)
则这些最小观察值的Results
值应为min(ScoringSheet$Percentiles)/2
As a result, I would need this 结果,我需要这个
TestResults <- data.frame(ID = c(1:9),
Observed = c(14, 8, 33, 23, 5, 79, 2, 11, 5),
Results = c(0.5,0.5,95,90,0.5,99,0.5,0.5,0.5))
Until now, I can get the corresponding percentiles for criterion 1 using merge()
on TestResults$Observed
and ScoringSheet$Score
, creating NAs for the values that are not exactly matching. 到目前为止,我可以在
TestResults$Observed
和ScoringSheet$Score
上使用merge()
获得标准1的相应百分位数,为不完全匹配的值创建NA。 I am now wondering how to implement criterion 2 and 3. 我现在想知道如何执行标准2和3。
Thank you in advance! 先感谢您!
Probably not the nicest solution, but it does the job. 可能不是最好的解决方案,但可以做到。 First we sort the
ScoringSheet
, then we use match
to find exact matches. 首先,我们对
ScoringSheet
排序,然后使用match
查找精确匹配。 Finally we loop over all the records where no exact match was found, and apply your calculation there. 最后,我们遍历未找到完全匹配的所有记录,并在那里应用您的计算。 I added a rule for when the score is higher than the 99th percentile score, in which it becomes equal to the highest percentile score.
我添加了一个规则,规定分数何时高于第99个百分位数,即等于最高百分位数。 I also added two more entries to show that the code below works properly.
我还添加了两个条目,以显示下面的代码可以正常工作。
TestResults <- data.frame(ID = c(1:11),
Observed = c(14, 8, 33, 23, 5, 79, 2, 11, 5,100,55), Results = NA)
ScoringSheet <- data.frame(Percentiles = c(99,95,85,55,10), Score = c(79,33,20,15,5))
ScoringSheet = ScoringSheet[order(ScoringSheet$Score,decreasing = F),]
TestResults$Results = ScoringSheet$Percentiles[match(TestResults$Observed,ScoringSheet$Score)]
for(i in which(is.na(TestResults$Results)))
{
x = tail(which((TestResults$Observed[i]>ScoringSheet$Score)),1)
if(!length(x)==0)
{
TestResults$Results[i] = mean(ScoringSheet$Percentiles[c(x,min(x+1,nrow(ScoringSheet)))])
}
else
{
TestResults$Results[i] = ScoringSheet$Percentiles[1]/2
}
}
Output: 输出:
ID Observed Results
1 1 14 32.5
2 2 8 32.5
3 3 33 95.0
4 4 23 90.0
5 5 5 10.0
6 6 79 99.0
7 7 2 5.0
8 8 11 32.5
9 9 5 10.0
10 10 100 99.0
11 11 55 97.0
Hope this helps! 希望这可以帮助!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.