当一列中的值没有以连续方式增加时，如何从数据框中删除行

Question

I have a data frame on R and I want to remove those rows which are not increasing in a consecutive way in the column B. I mean, the value in each row has to be higher than the previous one but lower than the next one. 我在R上有一个数据框，我想删除B列中没有连续增加的行。我的意思是，每行中的值都必须高于上一行，但要低于下一行。 I do not want to sort the data frame according to the column B because I want to keep the order in column A. I think I can do this with if statements but I do not have enough experience in R, thanks in advanced. 我不想根据B列对数据帧进行排序，因为我想将顺序保留在A列中。我想我可以使用if语句来做到这一点，但是我在R中没有足够的经验，谢谢高级。

What I have is this, and I have to remove the starred values. 我所拥有的就是这个，我必须删除加星标的值。

A       B   
26.00   11158115 
27.00   16722714* 
27.08   11881252 
90.25   69428973 
90.27   69749777 
93.30   64207240* 
95.90   71428751 
96.00   71670964 
107.65  100385980 
107.75  226164158* 
107.8   103280320

I need this: 我需要这个：

A       B   
26.00   11158115 
27.08   11881252 
90.25   69428973 
90.27   69749777 
95.90   71428751 
96.00   71670964 
107.65  100385980 
107.80  103280320

Answer 1

Here is a solution, sort of: 这是一种解决方案，包括：

A <- c(26.00, 27.00, 27.08, 90.25, 90.27, 93.30, 95.90, 96.00, 107.65, 107.75, 107.8)
B <- c(11158115, 16722714, 11881252, 69428973, 69749777, 64207240, 71428751, 71670964, 100385980,
       226164158, 103280320)
d <- data.frame(A, B)
repeat {
   delta <- diff(d$B)
               # delta gives you the difference between successive values of B
               # delta[1] corresponds to the difference between B[2] and B[1]
   if(all(delta > 0)) {
      break
   }
   iWrong <- 1 + which(delta < 0)
               # '1 +' means that if the next value is not larger than the previous value
               # (delta is not positive), we delete the next value
               # you can remove '1+' and delete this value instead
   d <- d[-iWrong,]
}

I say "sort of" because it is unclear for me which rows exactly should be removed. 我说“ sort of”是因为我不清楚应该删除哪些行。 Why to remove row 2 instead of row 3? 为什么要删除第2行而不是第3行？ Both will give you increasing values in B. With my solution you will get: 两者都会使您在B中的价值不断提高。通过我的解决方案，您将获得：

1   26.00  11158115
2   27.00  16722714
4   90.25  69428973
5   90.27  69749777
7   95.90  71428751
8   96.00  71670964
9  107.65 100385980
10 107.75 226164158

Answer 2

I can't find a better solution, but at least it works. 我找不到更好的解决方案，但至少它能起作用。

df = read.table(text = "A,B 
26.00,11158115
27.00,16722714
27.08,11881252
90.25,69428973
90.27,69749777
93.30,64207240
95.90,71428751
96.00,71670964
107.65,100385980
107.75,226164158
107.8,103280320", header = TRUE, sep = ",", stringsAsFactors = FALSE)

r = 2
repeat {

    if ((df$B[r] < df$B[r-1] | df$B[r] > df$B[r+1]) & df$B[r-1] < df$B[r+1]) {
        df <- df[-r,]    
    } else {
        r = r + 1
    }

    if (r == nrow(df)) break
}

df

Output: 输出：

        A         B
1   26.00  11158115
3   27.08  11881252
4   90.25  69428973
5   90.27  69749777
7   95.90  71428751
8   96.00  71670964
9  107.65 100385980
11 107.80 103280320

Explanation: 说明：

We run through each row of the dataframe from the second element (the first one will always be valid for being the first one). 我们从第二个元素开始遍历数据帧的每一行（第一个元素始终对第一个元素有效）。 Then, we delete each row with the expected criterion: the value must be higher than the previous one and lower than the next one ( (B[r] < B[r-1] or B[r] > B[r+1])) . 然后，我们使用预期的标准删除每一行：该值必须高于上一行，并低于下一行（ (B[r] < B[r-1] or B[r] > B[r+1])) 。 But with this criterion we don't get the expected result so we also verify that the subsequent value is higher than the previous one ( B[r-1] < df$B[r+1] ) 但是使用此标准，我们无法获得预期的结果，因此，我们还要验证后续值是否高于上一个值（ B[r-1] < df$B[r+1] ）

当一列中的值没有以连续方式增加时，如何从数据框中删除行

问题描述

2 个解决方案

解决方案1
0 2017-10-05 02:00:25

解决方案2
0 已采纳 2017-10-05 18:20:49

当一列中的值没有以连续方式增加时，如何从数据框中删除行

问题描述

2 个解决方案

解决方案1 0 2017-10-05 02:00:25

解决方案2 0 已采纳 2017-10-05 18:20:49

解决方案1
0 2017-10-05 02:00:25

解决方案2
0 已采纳 2017-10-05 18:20:49