简体   繁体   English

在R中快速操作数据帧

[英]Quick manipulation of data frame in R

I have the following example data frame: 我有以下示例数据框:

> a = data.frame(a=c(1, 2, 3), b=c(10, 11, 12), c=c(1, 1, 0))
> a
  a  b c
1 1 10 1
2 2 11 1
3 3 12 0

I want to do an operation to every row where if a$c == 1 , a$a = a$b , otherwise, a$a keeps its value. 我想对每行执行一个操作,如果a$c == 1a$a = a$b ,否则a$a保持其值。 The final data frame a should look like this: 最终数据帧a应该如下所示:

> a
  a  b c
1 10 10 1
2 11 11 1
3  3 12 0 

What is the fastest way to do this? 最快的方法是什么? Of course in my problem I have hundreds of thousands of rows, so looping over the entire data frame and doing one by one is extremely slow. 当然,在我的问题中,我有成千上万的行,因此遍历整个数据帧并一一进行非常慢。

Thanks! 谢谢!

Easy as 1-2-3: 容易如1-2-3:

df = data.frame(a=c(1, 2, 3), b=c(10, 11, 12), c=c(1, 1, 0))
df$a[df$c == 1] <- df$b[df$c == 1]
df
##    a  b c
## 1 10 10 1
## 2 11 11 1
## 3  3 12 0

It reads: substitute all the elements in a corresponding to c==1 with all the elements in b corresponding to c==1 . 它的读法是: b对应于c==1所有元素替换a对应于c==1的所有元素

A benchmark: 基准:

df <- data.frame(a=runif(100000), b=runif(100000), c=sample(c(1,0), 100000, replace=TRUE))
library(microbenchmark)
microbenchmark(df$a[df$c == 1] <- df$b[df$c == 1], df$a <- with(df, ifelse(c == 1, b, a)))
## Unit: milliseconds
##                                    expr      min       lq    median       uq       max neval
##      df$a[df$c == 1] <- df$b[df$c == 1] 13.85375 15.13073  16.61701  74.5387  88.47949   100
##  df$a <- with(df, ifelse(c == 1, b, a)) 44.23750 78.85029 103.01894 105.1750 118.09492   100
a$a <- with(a, ifelse(c == 1, b, a))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM