简体   繁体   中英

Extract rows based on specific condition in dataframe r

I have a data frame with 77,760 rows and I want to extract only rows that have row number difference of 13. So I want rows like 1st, 14th, 27th, 40th, 53th, 66th, 79th, 92th, 105th, 118th, 131th, 144th. But after each multiple of 144 I want take next row (145th, 289th..) and again extract same seq of difference of 13 rows. So after 144th row I don't want next row 157th but 145th and then it continues 1st... 144th, 145th, 158th... till it reaches next multiple of 144 (ie 288th row) and then again 1... 144th, 145th, 158th, 171th... 288th, 289th... 302th... ...77,760th row.

So far, as a solution to my last post I tried using following to extract all rows with difference of 13th.

my_frame[seq(from = 1, to = nrow(dataframe), by = 13), ]

But, now I want to theoretically reset row seq after every 144th, 288th, 432th row and extract rows as mentioned

Actual results I am getting: 1st, 14th... 144th, 157th, 170th... ...77,760th rows

Expected results: 1st, 14th... 144th, 145th, 158th... 288th, 289th... ...432th, 433th... ...77,760th

Can anyone help me with logic?

You can generate the row numbers first and use it to subset your dataframe -

row_numbers <- c(sapply(seq(1, 77760, 144), function(x) seq(x, by = 13, length.out = 12)))

head(row_numbers, 50)
 [1]   1  14  27  40  53  66  79  92 105 118 131 144 145 158 171 184 197 210 223 236 
[21] 249 262 275 288 289 302 315 328 341 354 367 380 393 406 419 432 433 446 459 472
[41] 485 498 511 524 537 550 563 576 577 590

result <- your_df[row_numbers, ]

An option would be to split the data.frame

my_frame1 <- do.call(rbind, lapply(unname(split(my_frame, 
      (seq_len(nrow(my_frame)) - 1) %/% 144 + 1)),
           function(dat) dat[seq(1, nrow(dat), by = 13),]))

row.names(my_frame1)
#[1] "1"   "14"  "27"  "40"  "53"  "66"  "79"  "92"  "105" "118" "131" 
#[12] "144" "145" "158" "171" "184" "197" "210" "223" "236" "249" ...

It may be also better to split by the sequence of rows

s1 <-  seq_len(nrow(my_frame))
i1 <- unlist(lapply(unname(split(s1, (s1-1) %/% 144 + 1)),
                `[`, rep(c(TRUE, FALSE), c(1, 12))))
my_frame1 <- my_frame[i1,]

data

set.seed(24)
my_frame <- data.frame(col1 = sample(1:9, 1000, replace = TRUE), col2 = rnorm(1000))

Another option would be to use a while loop to generate the row numbers and then proceed to extract data from these rows. An 'index' variable is used to jump from a row number to other at every iteration of the while loop. If this 'index' has a value which is a multiple of 144, then 'index' will be incremented by 1 else by 13. Every value that was ever stored by 'index' will become a part of our 'imp_row' vector.

index = 1
final_row = nrow(data_frame_name) 
#Obtain the no. of rows; this will be used to limit the number generation process of while loop
imp_row = c() #this will hold all the important row numbers
while(index<final_row){ #perform number generation until we reach the final row number
  imp_row = append(imp_row, index) 
  if((index%%144) == 0){ 
    index = index + 1}else{
    index = index + 13
  }
}

head(imp_row,20)
#now you can index your dataframe via the imp_row vector as : data_frame_name[imp_row,]

Alternatively, you can also skip the recording of 'index' values in the 'imp_row' and directly use the 'index' value as row numbers in the data frame.

index = 1
final_row = nrow(data_frame_name) 
#Obtain the no. of rows; this will be used to limit the number generation process of while loop
while(index<final_row){ #perform number generation until we reach the final row number

  #you can directly use data_frame_name[index, ] and perform your operation of 
  #interest at those specific row numbers, and then  
  #increment 'index' as per your requirements

  if((index%%144) == 0){ 
    index = index + 1}else{
    index = index + 13
  }

}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM