简体   繁体   English

循环使用带有 if 语句的 dataframe 应用 function 的行

[英]Loop over rows of dataframe applying function with if-statement

I'm new to R and I'm trying to sum 2 columns of a given dataframe, if both the elements to be summed satisfy a given condition.我是 R 的新手,如果要求和的两个元素都满足给定条件,我正在尝试对给定 dataframe 的 2 列求和。 To make things clear, what I want to do is:为了清楚起见,我想做的是:

> t.d<-as.data.frame(matrix(1:9,ncol=3))
> t.d
  V1 V2 V3
  1  4  7  
  2  5  8  
  3  6  9  

> t.d$V4<-rep(0,nrow(t.d))

> for (i in 1:nrow(t.d)){
+   if (t.d$V1[i]>1 && t.d$V3[i]<9){
+     t.d$V4[i]<-t.d$V1[i]+t.d$V3[i]}
+     }

> t.d    
  V1 V2 V3 V4
  1  4  7  0
  2  5  8 10
  3  6  9  0

I need an efficient code, as my real dataframe has about 150000 rows and 200 columns.我需要一个高效的代码,因为我真正的 dataframe 有大约 150000 行和 200 列。 This gives an error:这给出了一个错误:

t.d$V4<-t.d$V1[t.d$V1>1]+ t.d$V3[t.d$V3>9] 

Is "apply" an option? “应用”是一个选项吗? I tried this:我试过这个:

t.d<-as.data.frame(matrix(1:9,ncol=3))
t.d$V4<-rep(0,nrow(t.d))

my.fun<-function(x,y){
  if(x>1 && y<9){
    x+y}
}

t.d$V4<-apply(X=t.d,MAR=1,FUN=my.fun,x=t.d$V1,y=t.d$V3)

but it gives an error as well.但它也给出了一个错误。 Thanks very much for your help.非常感谢您的帮助。

This operation doesn't require loops, apply statements or if statements.此操作不需要循环、应用语句或 if 语句。 Vectorised operations and subsetting is all you need:向量化操作和子集是您所需要的:

t.d <- within(t.d, V4 <- V1 + V3)
t.d[!(t.d$V1>1 & t.d$V3<9), "V4"] <- 0
t.d

  V1 V2 V3 V4
1  1  4  7  0
2  2  5  8 10
3  3  6  9  0

Why does this work?为什么这行得通?

In the first step I create a new column that is the straight sum of columns V1 and V4.在第一步中,我创建一个新列,它是列 V1 和 V4 的直和。 I use within as a convenient way of referring to the columns of df without having to write df$V all the time.我使用within作为一种方便的方式来引用df的列,而不必一直编写df$V

In the second step I subset all of the rows that don't fulfill your conditions and set V4 for these to 0.在第二步中,我对所有不满足条件的行进行子集化,并将这些行的 V4 设置为 0。

ifelse is your friend here: ifelse是你的朋友:

t.d$V4<-ifelse((t.d$V1>1)&(t.d$V3<9), t.d$V1+ t.d$V3, 0)

I'll chip in and provide yet another version.我会加入并提供另一个版本。 Since you want zero if the condition doesn't mach, and TRUE/FALSE are glorified versions of 1/0, simply multiplying by the condition also works:由于如果条件不匹配则您想要零,并且 TRUE/FALSE 是 1/0 的美化版本,因此只需乘以条件也可以:

t.d<-as.data.frame(matrix(1:9,ncol=3))
t.d <- within(t.d, V4 <- (V1+V3)*(V1>1 & V3<9))

...and it happens to be faster than the other solutions;-) ...而且它恰好比其他解决方案更快;-)

t.d <- data.frame(V1=runif(2e7, 1, 2), V2=1:2e7, V3=runif(2e7, 5, 10))
system.time( within(t.d, V4 <- (V1+V3)*(V1>1 & V3<9)) )         # 3.06 seconds
system.time( ifelse((t.d$V1>1)&(t.d$V3<9), t.d$V1+ t.d$V3, 0) ) # 5.08 seconds
system.time( { t.d <- within(t.d, V4 <- V1 + V3); 
               t.d[!(t.d$V1>1 & t.d$V3<9), "V4"] <- 0 } )       # 4.50 seconds

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM