简体   繁体   English

使用R中的逻辑函数,用Apply系列函数(或dplyr)替换循环

[英]replace loops with apply family functions (or dplyr), using logical functions in R

I have created this representative data frame that assigns condition categories using a for loop. 我创建了这个代表性的数据框,该框使用for循环分配条件类别。

df <- data.frame(Date=c("08/29/2011", "08/29/2011", "08/30/2011", "08/30/2011", "08/30/2011", "08/29/2012", "08/29/2012", "01/15/2012", "08/29/2012"),
             Time=c("09:45", "10:00", "13:00", "13:30", "10:14", "9:09", "11:23", "17:06", "12:20"),
             Diff = c(0.2,4.3,6.5,15.0, 16.5, 31, 30.2, 21.9, 1.9))

df1<- df %>%
  mutate(Accuracy=ifelse(Diff<=3, "Excellent", "TBD"))

for(i in 1:nrow(df1)){
  if(df1$Diff[i]>3&&df1$Diff[i]<=10){
    df1$Accuracy[i]<-"Good"} 
  if(df1$Diff[i]>10&&df1$Diff[i]<=15){
    df1$Accuracy[i]<-"Fair"} 
  if(df1$Diff[i]>15&&df1$Diff[i]<=30){
    df1$Accuracy[i]<-"Poor"}
  if(df1$Diff[i]>30){
    df1$Accuracy[i]<-"Unacceptable"}
}

My actual dataset is very large and reading indicates for loops are usually not the most efficient way to code in R. I believe I can do the same thing by creating a logical vector for each condition, and within each vector TRUE is when each condition is met. 我的实际数据集非常大,并且读取指示for循环通常不是用R编写代码的最有效方法。我相信我可以通过为每个条件创建一个逻辑向量来完成相同的事情,并且在每个向量内,TRUE是每个条件为满足。 Then, I can assign the values by subsetting, df1$Accuracy[Good]<-"Good" for example. 然后,我可以通过子集df1 $ Accuracy [Good] <-“ Good”来分配值。 However, I can not figure out how to create the logical vector using the apply family functions or dplyr functions. 但是,我不知道如何使用Apply系列函数或dplyr函数创建逻辑向量。 (But, any solution that avoids for loops is also welcome.) If for loops are the better way to go, that would also be helpful to know. (但是,也欢迎使用任何避免for循环的解决方案。)如果for循环是更好的选择,那么这也将有所帮助。

Here are my failed attempts. 这是我失败的尝试。 These return incorrect NA's or incorrect logical vectors. 这些返回不正确的NA或不正确的逻辑向量。 One of the many things I do not understand is how lapply knows to go over columns or rows. 我不了解的许多事情之一是lapply如何知道要遍历列或行。

Good<-apply(df1, 1, function(x) ifelse(df1$Diff[x]>3&& df1$Diff[x]<=10, TRUE, FALSE)) #logical, TRUE where condition is true 
Good<-unlist(lapply(df1$Diff,  function(x) {(ifelse(df1$Diff[x]>3&& df1$Diff[x]<=10, TRUE, FALSE))}))

Update: Nested ifelse statements will work, but any suggestions on how to use apply are still welcome. 更新:嵌套ifelse语句将起作用,但是仍然欢迎有关如何使用apply的任何建议。

mutate(Accuracy=ifelse(pDiff<=3, "Excellent", 
                         ifelse(pDiff>3&pDiff<=10, "Good",
                                ifelse(pDiff>10&pDiff<=15, "Fair",
                                       ifelse(pDiff>15&pDiff<30, "Poor",
                                              ifelse(Diff>30, "Unpublishable", "TBD"))))))  

You could use case_when from dplyr : 您可以使用case_whendplyr

df1<- df %>%
mutate(Accuracy= case_when(
  .$Diff <=  3 ~ "Excellent",
  .$Diff <=  10  ~ "Good",
  .$Diff <=  15  ~ "Fair",
  .$Diff <=  30  ~ "Poor",
  .$Diff >   30  ~ "Unpublishable",
  TRUE  ~"TBD")
)

 df1
        Date  Time Diff      Accuracy
1 08/29/2011 09:45  0.2     Excellent
2 08/29/2011 10:00  4.3          Good
3 08/30/2011 13:00  6.5          Good
4 08/30/2011 13:30 15.0          Fair
5 08/30/2011 10:14 16.5          Poor
6 08/29/2012  9:09 31.0 Unpublishable
7 08/29/2012 11:23 30.2 Unpublishable
8 01/15/2012 17:06 21.9          Poor
9 08/29/2012 12:20  1.9     Excellent

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM