简体   繁体   English

ddply不从函数除以变量后返回值

[英]ddply not returning values from function split by variable

I'm using the ddply function (plyr) to calculate something separately by participant id ( pid ). 我正在使用ddply函数(plyr)通过参与者id( pid )分别计算出一些东西。 However, for some reason it's not returning separate values by pid , but rather the same value across all pid . 但是,由于某种原因,它不是按pid返回单独的值,而是在所有pid中返回相同的值。

Sample data: 样本数据:

sdt<-c("Hit","Hit","Miss","Miss","False Alarm","Correct Reject","Correct Reject","Correct Reject",
   "Hit","Hit","Hit","Miss","False Alarm","False Alarm","False ALarm","Correct Reject")

pid<-c(1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2)

adhd_p<-data.frame(sdt,pid)

Function: 功能:

ddply(adhd_p, "pid", summarise,
  hitrate=(count(adhd_p$sdt=="Hit")[[2,2]])/((count(adhd_perf$sdt=="Hit")[[2,2]])+(count(adhd_p$sdt=="Miss")[[2,2]])),
  falsealarmrate=(count(adhd_p$sdt=="False Alarm")[[2,2]])/((count(adhd_p$sdt=="False Alarm")[[2,2]])+(count(adhd_p$sdt=="Correct Reject")[[2,2]])))

If it helps to understand what I'm calculating... Participants can either "Hit" (respond affirmatively to target), "Miss" (do not respond to target), "Correct Reject" (do not respond to distractor), or "False Alarm" (respond affirmatively to distractor). 如果有助于理解我的计算内容,则参与者可以“击中”(对目标做出肯定的回应),“小姐”(对目标没有回应),“正确拒绝”(对干扰因素不回应),或者“错误警报”(对干扰项的肯定答复)。 Thus, "hitrate" is number of hits/hits+misses, and "falsealarmrate" is number of false alarms/false alarms+correct reject. 因此,“命中率”是命中/命中+未命中的数目,而“假警报”是错误警报/错误警报+正确拒绝的数目。

What am I doing wrong? 我究竟做错了什么?

Thanks for your time. 谢谢你的时间。

Edit: Above problem solved very quickly by editing code to 编辑:通过将代码编辑为

 ddply(adhd_p, "pid", summarise,
  hitrate=(count(sdt=="Hit")[[2,2]])/((count(sdt=="Hit")[[2,2]])+(count(sdt=="Miss")[[2,2]])),
  falsealarmrate=(count(sdt=="False Alarm")[[2,2]])/((count(sdt=="False Alarm")[[2,2]])+(count(adhd_p$sdt=="Correct Reject")[[2,2]])))

I realize now that I need to split over two variables rather than just one. 我现在意识到,我需要将两个变量而不是一个变量分开。 However adding a time variable: 但是添加一个时间变量:

time<-c(1,2,3,4,5,6,7,8,1,2,3,4,5,6,7,8)

And merging it in with the others 并与其他人合并

adhd_p<-data.frame(sdt,pid,time)

Makes the new script produce a "subscript out of bounds" error. 使新脚本产生“下标超出范围”错误。

ddply(adhd_p, .(pid,time), summarise,
  hitrate=(count(sdt=="Hit")[[2,2]])/((count(sdt=="Hit")[[2,2]])+(count(sdt=="Miss")[[2,2]])),
  falsealarmrate=(count(sdt=="False Alarm")[[2,2]])/((count(sdt=="False Alarm")[[2,2]])+(count(sdt=="Correct Reject")[[2,2]])))

Any thoughts? 有什么想法吗?

What you need to be doing: 您需要做什么:

ddply(adhd_p, "pid", summarise,
  hitrate=(count(sdt=="Hit")[[2,2]])/((count(sdt=="Hit")[[2,2]])+(count(sdt=="Miss")[[2,2]])),
  falsealarmrate=(count(sdt=="False Alarm")[[2,2]])/((count(sdt=="False Alarm")[[2,2]])+(count(sdt=="Correct Reject")[[2,2]])))

Why you need to be doing it: 为什么需要这样做:

When you call ddply , the function works within the .data ( adhd_p in your case) as the local namespace. 当您调用ddply ,该函数在.data (在您的情况下为adhd_p )中作为本地名称空间工作。 This is similar to calling attach(adhd_p) ; 这类似于调用attach(adhd_p) calling the name of a column without referencing the dataframe explicitly still calls the correct column. 在没有显式引用数据框的情况下调用列名仍然会调用正确的列。

When you supply the summarise argument, the function splits up vectors in the local namespace based on the the id columns supplied (in this case, pid ). 提供summarise参数时,该函数将根据提供的id列(在本例中为pid )在本地名称空间中分割向量。 So, if you reference columns without referencing the dataframe explicitly as above, calculations will be done with the portion of the sdt column corresponding to each pid . 因此,如果您引用的列没有像上面那样显式地引用数据框,则将使用sdt列中与每个pid对应的部分进行计算。 However, if you reference the column and dataframe explictly ( adhd_p$sdt in your case), it just pulls in the entire vector from the global namespace and doesn't split it appropriately. 但是,如果您adhd_p$sdt引用列和数据adhd_p$sdt (在本例中为adhd_p$sdt ),则只会从全局名称空间中提取整个向量,而不会对其进行适当的拆分。

Edit: the code below is both less messy and won't raise an error if one of the values is missing: 编辑:下面的代码既不那么混乱,而且如果缺少其中一个值,也不会引发错误:

ddply(adhd_p, .(pid, time), summarise,
      hitrate=(sum(sdt=="Hit"))/(sum(sdt=="Hit"))+(sum(sdt=="Miss")),
      falsealarmrate=(sum(sdt=="False Alarm"))/(sum(sdt=="False Alarm"))+(sum(sdt=="Correct Reject")))

I haven't delved into why what you are doing is wrong, but here is an answer that might help: 我没有深入研究为什么您做错了什么,但这是一个可能有用的答案:

ddply(
  adhd_p, "pid", summarize, 
  hitrate=sum(sdt == "Hit") / sum(sdt %in% c("Hit", "Miss")),
  falsealarmrate=sum(sdt == "False Alarm") / sum(sdt %in% c("False Alarm", "Correct Reject"))
)

Produces: 产生:

  pid hitrate falsealarmrate
1   1    0.50      0.2500000
2   2    0.75      0.6666667

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM