简体   繁体   中英

R: Subsetting with two variables

This is a follow-up to: R: Subsetting on increasing value to max excluding the decreasing

The posted solution works and now I would like to add a low cutoff based on a second variable. Thus far I'm not certain about how to approach this with data.table. As an example, I would like to restrict output to max of B and all values after the first instance of D == 1 by TrialNum. I assume this means extracting and using the index (using which?) associated with the low cutoff of D.

TrialNum,Obs ,A,B,C,D
1,1,23,1,23,1
1,2,21,2,21,2
1,3,14,3,14,1
1,4,34,4,34,3
1,5,32,5,32,2
1,6,21,3,21,1
1,7,16,5,16,3
1,8,18,2,18,1 
2,1,26,1,26,1
2,2,11,2,11,2
2,3,23,3,23,1
2,4,12,4,12,1
2,5,3,2,3,1
2,6,4,3,4,3
2,7,22,1,22,1
2,8,15,2,15,1

Expected output,

TrialNum,Obs,A,B,C,D
1,2,21,2,21,2
1,3,14,3,14,1
1,4,34,4,34,3
1,5,32,5,32,2
2,2,11,2,11,2
2,3,23,3,23,1
2,4,12,4,12,1

So, it's just the first instance of the low cutoff. I don't which to lose data where D drops below threshold after identifying the starting point. Like the solution posted yesterday, I've tried variations of using which in the expression to capture both max(B) and the low cutoff associated with D.

A data.table solution is preferable because it seems currently data.table and dplyr are incompatible on Windows R3.2.0.

To solve your problem, think about how to find the row numbers you are after.

Assume for the moment our dataframe has just one TrialNum in it. In your previous question, you learned that to find the row with the maximum value of B , you can use which.max(B) .

Now you want to find the row where D is 1, so you can use which(D==1) . Now, if multiple rows equal 1, which will return multiple indices (see ?which ), so you can use [1] to get just the index of the first occurence. Since you don't want to include the D==1 row itself, add 1 to the index: which(D==1)[1] + 1 .

When you have these two numbers, you just want all rows in between, ie (which(D==1)[1] + 1):which.max(B) .

Then combine with by=TrialNum to ensure that your dataframe only has one TrialNum in it:

x[, .SD[(which(D==1)[1] + 1):which.max(B)], by=TrialNum]

(Note - what will you do if there is no row where D==1 ? You will have to think about how to handle that).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM