简体   繁体   中英

Mark range based on identifier

This is based on a question I asked yesterday. It got very muddled, so I am trying again with a clearer question.

I have a large data set.

>head(raw)

 ps cond pass sample stim gsr
1  1    2    0      0    0 100   
2  1    2    0      1    0 100   
3  1    2    0      2    0 100   
4  1    2    0      3    0 100   
5  1    2    0      4    0 100   
6  1    2    0      5    0 100   

The $stim column is composed of various periods numbered 1-11 (each period lasts 20 $sample), with longer blocks of 0 (lasting for 140 $sample).

For every $stim==10, I need to mark a subsequent range:

eg,

count(raw$sample[raw$ps==1 & raw$stim==10]) #this counts for 1 subject as an example

      x freq
1  1100    1
2  1101    1
3  1102    1
4  1103    1
5  1104    1
6  1105    1
7  1106    1
8  1107    1
9  1108    1
10 1109    1
11 1110    1
12 1111    1
13 1112    1
14 1113    1
15 1114    1
16 1115    1
17 1116    1
18 1117    1
19 1118    1
20 1119    1

So I want the beginning of the range to start 10 cells after the last cell with $stim==10 (in this example it is 1119, so we start from 1120. We need to count 10 from this place: 1130. The end of the range is 50 $sample from 1130 = 1180.

So. What I think I need is for a new column in my raw file, which marks TRUE those cells to be used in the analysis. In the above example, they would be the range between $sample == 1130 and 1180.

I don't want to go through by hand. I am looking for a more automated way of ticking off the ranges.

I hope it is now clearer what I am aiming for?

Further information:

> sort(unique(rle(raw$n.filter)$length))
 40   50  590 1080 1130 1240 1400 1560 1720 1880 2030 2040 2200 2360

> summary(raw$stim)
     0      1      2      3      4      5      6      7      8      9     10     11 
286440   3720   3720   3720   3720   3720   3720   3720   3720   3720   3720   3720 

> summary(raw$stim[raw$ps==1])
   0    1    2    3    4    5    6    7    8    9   10   11 
1540   20   20   20   20   20   20   20   20   20   20   20 

> summary(raw$stim[raw$ps==186])
   0    1    2    3    4    5    6    7    8    9   10   11 
1540   20   20   20   20   20   20   20   20   20   20   20 

Edited answer due to an error in previous one:

There are two ways of getting what you want. One way is vectorized (and fast), the other one is with a loop and slow.

1.Vectorized:

tmp <- which(raw$stim == 10)
ltmp <- 1:length(tmp)

raw$n.filter <- FALSE
raw[tmp + 30,"n.filter"] <- TRUE
raw[tmp + 50,"n.filter"] <- TRUE
raw[tmp[ltmp[(ltmp%%20) > 0 & (ltmp%%20) < 11]]+70,"n.filter"] <- TRUE
rle(raw$n.filter)

2.With Loop:

raw$n.filter <- FALSE

for (counter in 2:(nrow(raw))) {
    if ( (raw[counter-1, "stim"] == 10) & raw[counter, "stim"] != 10) raw[(counter+10):(counter+59),"n.filter"] <- TRUE
}

rle(raw$n.filter)

I was too lazy to wait for the loop version to finish. it is the best you copy the result of one version, then run the other and see whether they are all.equal() .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM