简体   繁体   中英

Subset based on both columns and rows

I've a R dataframe like this containing prices at different times

     product_1  product_2  product_3  product_4  product_5
 t1  10         10         10         0          14
 t2  20         0          50         15         15
 t3  30         0          60         12         12
 t4  40         14         15         5          0

What query would give me all the table containing prices at all times for products whose price is 0 at least once after a specific time = t2? Basically a subset of data-frame based on both row and column conditions.

     product_2  product_5
 t1  10         14
 t2  0          15
 t3  0          12
 t4  14         0

Read data:

dd <- read.table(header=TRUE,text="
    product_1  product_2  product_3  product_4  product_5
 t1  10         10         10         0          14
 t2  20         0          50         15         15
 t3  30         0          60         12         12
 t4  40         14         15         NA          0")

Find index of critical time:

which.time <- which(rownames(dd)=="t2")

Function to identify columns to keep (could also use any(na.omit(tail(x,-which.time)==0)) ; na.omit() is necessary to avoid NA s ending up in the logical vector that specifies which columns to keep, which will lead to a slightly obscure undefined columns selected error ...

keepvar <- function(x) {
    any(na.omit(x[-(1:(which.time-1))])==0)
}

Now do the actual selection:

dd[sapply(dd,keepvar)]

Assuming your data is called df

df[,as.logical(apply(df, 2, function(x) sum(x[as.logical(cumsum(rownames(df)=="t2"))] == 0)))]
   product_2 product_5
t1        10        14
t2         0        15
t3         0        12
t4        14         0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM