I hope I can do a satisfactory job of explaining my question. I can get R to do what I want, but it feels very clumsy, so I'm looking for a better way of attaining the same result.
I have a data frame that looks something like this (although I'm also open to other structures if they work better.)
subject <- c(1,1,3,3)
day <- c(3, 20, 1, 14)
status <- c(1, 1, 1, 3)
df <- cbind(subject, day, status)
I want to find the most efficient way to see, for example, if subject 1 has status 1 on day 3 (yes) or to test if on day 20 a subject has any status other than 3. So far my attempt is functional but clumsy and ugly.
has_event <- function(i, j, data) {
any(data[(data[, "subject"] == i) & (data[, "status"] != 3), "day"] == j)
}
has_event(1, 3, df) # evaluates to TRUE
has_event(1, 4, df) # evaluates to FALSE
I don't see this method going very far, as the logic only becomes more complicated from there. I feel like I'm missing some very simple method of calling the data. If I wanted to see how many subjects did not have a status of 3 on a specific day, for example, it would look like this using my method:
length(unique(df[, "subject"],)) - length(which(df[, "status"] == 3 & df[, "day"] == 14))
And that's just unmanageable.
The overall goal is to format my data in a way where I can access things easily by date or by subject, but I'm just kind of floundering right now unsure of which avenue to investigate.
How about dplyr::filter()
but remember to convert your matrix to a data.frame. Just add the filter condition one by one.
df<-data.frame(df)
require(dplyr)
filter(df,status!=3,day==20)
subject day status
1 1 20 1
Or with data.table
require(data.table)
data.table(df)[status!=3][day==20]
Actually timing it for 100 000 recs dplyr
is faster, but both quick for these sorts of simple sorts:
df<-data.frame(subject=sample(1:5,100000,T),day=sample(1:20,100000,T),status=sample(1:10,100000,T))
> system.time(data.table(df)[status!=3][day==20])
user system elapsed
0.01 0.00 0.02
> system.time(filter(df,status!=3,day==20))
user system elapsed
0 0 0
Using sqldf
package:
df <- data.frame(df)
require(sqldf)
sqldf("select * from df where status!=3 and day=20")
subject day status
1 1 20 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.