简体   繁体   中英

Applying logic to data frame columns in R

I hope I can do a satisfactory job of explaining my question. I can get R to do what I want, but it feels very clumsy, so I'm looking for a better way of attaining the same result.

I have a data frame that looks something like this (although I'm also open to other structures if they work better.)

subject <- c(1,1,3,3)
day     <- c(3, 20, 1, 14)
status  <- c(1, 1, 1, 3)
df      <- cbind(subject, day, status)

I want to find the most efficient way to see, for example, if subject 1 has status 1 on day 3 (yes) or to test if on day 20 a subject has any status other than 3. So far my attempt is functional but clumsy and ugly.

has_event <- function(i, j, data) {
    any(data[(data[, "subject"] == i) & (data[, "status"] != 3), "day"] == j)
}

has_event(1, 3, df) # evaluates to TRUE
has_event(1, 4, df) # evaluates to FALSE

I don't see this method going very far, as the logic only becomes more complicated from there. I feel like I'm missing some very simple method of calling the data. If I wanted to see how many subjects did not have a status of 3 on a specific day, for example, it would look like this using my method:

length(unique(df[, "subject"],)) - length(which(df[, "status"] == 3 & df[, "day"] == 14))

And that's just unmanageable.

The overall goal is to format my data in a way where I can access things easily by date or by subject, but I'm just kind of floundering right now unsure of which avenue to investigate.

How about dplyr::filter() but remember to convert your matrix to a data.frame. Just add the filter condition one by one.

df<-data.frame(df)

require(dplyr)

filter(df,status!=3,day==20)

  subject day status
1       1  20      1  

Or with data.table

require(data.table)

data.table(df)[status!=3][day==20]

Actually timing it for 100 000 recs dplyr is faster, but both quick for these sorts of simple sorts:

df<-data.frame(subject=sample(1:5,100000,T),day=sample(1:20,100000,T),status=sample(1:10,100000,T))

> system.time(data.table(df)[status!=3][day==20])
user  system elapsed 
0.01    0.00    0.02 
> system.time(filter(df,status!=3,day==20))
user  system elapsed 
0       0       0 

Using sqldf package:

df <- data.frame(df)
require(sqldf)

sqldf("select * from df where status!=3 and day=20")

  subject day status
1       1  20      1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM