根据数据框r中的特定条件提取行

Question

我有一个包含77,760行的数据框，并且我只想提取行号差为13的行。所以我想要像第1、14、27、40、53、66、79、92、105、118、131这样的行，第144位。 但是在144的每个倍数之后，我要取下一行（第145、289 ..），并再次提取相同的13列差值序列。 因此，在第144行之后，我不希望下一个第157行，而是第145个行，然后它继续执行第1个... 144个，第145个，第158个...直到到达144的下一个倍数（即第288行），然后再达到1个... 144个，145th，158th，171th ... 288th，289th ... 302th ... ... 77,760 row。

到目前为止，作为我上一篇文章的解决方案，我尝试使用以下命令提取所有差异为13的行。

my_frame[seq(from = 1, to = nrow(dataframe), by = 13), ]

但是，现在我想从理论上说在每第144、288、432行之后重置行seq并提取行

我得到的实际结果：第1，第14 ...第144、157、170 ... 77,760行

预期结果：第1，第14，第144，第145，第158，第288，第289 ......第432，第433 ......第77,760

谁能在逻辑上帮助我？

Answer 1

您可以先生成行号，然后将其用于子集数据框-

row_numbers <- c(sapply(seq(1, 77760, 144), function(x) seq(x, by = 13, length.out = 12)))

head(row_numbers, 50)
 [1]   1  14  27  40  53  66  79  92 105 118 131 144 145 158 171 184 197 210 223 236 
[21] 249 262 275 288 289 302 315 328 341 354 367 380 393 406 419 432 433 446 459 472
[41] 485 498 511 524 537 550 563 576 577 590

result <- your_df[row_numbers, ]

Answer 2

一种选择是split data.frame

my_frame1 <- do.call(rbind, lapply(unname(split(my_frame, 
      (seq_len(nrow(my_frame)) - 1) %/% 144 + 1)),
           function(dat) dat[seq(1, nrow(dat), by = 13),]))

row.names(my_frame1)
#[1] "1"   "14"  "27"  "40"  "53"  "66"  "79"  "92"  "105" "118" "131" 
#[12] "144" "145" "158" "171" "184" "197" "210" "223" "236" "249" ...

按行顺序进行split也可能更好

s1 <-  seq_len(nrow(my_frame))
i1 <- unlist(lapply(unname(split(s1, (s1-1) %/% 144 + 1)),
                `[`, rep(c(TRUE, FALSE), c(1, 12))))
my_frame1 <- my_frame[i1,]

数据

set.seed(24)
my_frame <- data.frame(col1 = sample(1:9, 1000, replace = TRUE), col2 = rnorm(1000))

Answer 3

另一种选择是使用while循环生成行号，然后继续从这些行中提取数据。 “索引”变量用于在while循环的每次迭代中从行号跳转到另一行。 如果此“索引”的值是144的倍数，则“索引”将增加1，否则将增加13。“索引”存储的每个值都将成为“ imp_row”向量的一部分。

index = 1
final_row = nrow(data_frame_name) 
#Obtain the no. of rows; this will be used to limit the number generation process of while loop
imp_row = c() #this will hold all the important row numbers
while(index<final_row){ #perform number generation until we reach the final row number
  imp_row = append(imp_row, index) 
  if((index%%144) == 0){ 
    index = index + 1}else{
    index = index + 13
  }
}

head(imp_row,20)
#now you can index your dataframe via the imp_row vector as : data_frame_name[imp_row,]

或者，您也可以跳过在“ imp_row”中记录“索引”值，而直接将“索引”值用作数据帧中的行号。

index = 1
final_row = nrow(data_frame_name) 
#Obtain the no. of rows; this will be used to limit the number generation process of while loop
while(index<final_row){ #perform number generation until we reach the final row number

  #you can directly use data_frame_name[index, ] and perform your operation of 
  #interest at those specific row numbers, and then  
  #increment 'index' as per your requirements

  if((index%%144) == 0){ 
    index = index + 1}else{
    index = index + 13
  }

}

根据数据框r中的特定条件提取行

问题描述

3 个解决方案

解决方案1
2 2019-07-16 00:22:56

解决方案2
1 2019-07-16 00:14:01

数据

解决方案3
0 2019-07-16 01:18:57

根据数据框r中的特定条件提取行

问题描述

3 个解决方案

解决方案1 2 2019-07-16 00:22:56

解决方案2 1 2019-07-16 00:14:01

数据

解决方案3 0 2019-07-16 01:18:57

解决方案1
2 2019-07-16 00:22:56

解决方案2
1 2019-07-16 00:14:01

解决方案3
0 2019-07-16 01:18:57