简体   繁体   English

当一列中的值没有以连续方式增加时,如何从数据框中删除行

[英]How to remove rows from a data frame when the values in one column are not increasing in a consecutive way

I have a data frame on R and I want to remove those rows which are not increasing in a consecutive way in the column B. I mean, the value in each row has to be higher than the previous one but lower than the next one. 我在R上有一个数据框,我想删除B列中没有连续增加的行。我的意思是,每行中的值都必须高于上一行,但要低于下一行。 I do not want to sort the data frame according to the column B because I want to keep the order in column A. I think I can do this with if statements but I do not have enough experience in R, thanks in advanced. 我不想根据B列对数据帧进行排序,因为我想将顺序保留在A列中。我想我可以使用if语句来做到这一点,但是我在R中没有足够的经验,谢谢高级。

What I have is this, and I have to remove the starred values. 我所拥有的就是这个,我必须删除加星标的值。

A       B   
26.00   11158115 
27.00   16722714* 
27.08   11881252 
90.25   69428973 
90.27   69749777 
93.30   64207240* 
95.90   71428751 
96.00   71670964 
107.65  100385980 
107.75  226164158* 
107.8   103280320 

I need this: 我需要这个:

A       B   
26.00   11158115 
27.08   11881252 
90.25   69428973 
90.27   69749777 
95.90   71428751 
96.00   71670964 
107.65  100385980 
107.80  103280320 

Here is a solution, sort of: 这是一种解决方案,包括:

A <- c(26.00, 27.00, 27.08, 90.25, 90.27, 93.30, 95.90, 96.00, 107.65, 107.75, 107.8)
B <- c(11158115, 16722714, 11881252, 69428973, 69749777, 64207240, 71428751, 71670964, 100385980,
       226164158, 103280320)
d <- data.frame(A, B)
repeat {
   delta <- diff(d$B)
               # delta gives you the difference between successive values of B
               # delta[1] corresponds to the difference between B[2] and B[1]
   if(all(delta > 0)) {
      break
   }
   iWrong <- 1 + which(delta < 0)
               # '1 +' means that if the next value is not larger than the previous value
               # (delta is not positive), we delete the next value
               # you can remove '1+' and delete this value instead
   d <- d[-iWrong,]
}

I say "sort of" because it is unclear for me which rows exactly should be removed. 我说“ sort of”是因为我不清楚应该删除哪些行。 Why to remove row 2 instead of row 3? 为什么要删除第2行而不是第3行? Both will give you increasing values in B. With my solution you will get: 两者都会使您在B中的价值不断提高。通过我的解决方案,您将获得:

1   26.00  11158115
2   27.00  16722714
4   90.25  69428973
5   90.27  69749777
7   95.90  71428751
8   96.00  71670964
9  107.65 100385980
10 107.75 226164158

I can't find a better solution, but at least it works. 我找不到更好的解决方案,但至少它能起作用。

df = read.table(text = "A,B 
26.00,11158115
27.00,16722714
27.08,11881252
90.25,69428973
90.27,69749777
93.30,64207240
95.90,71428751
96.00,71670964
107.65,100385980
107.75,226164158
107.8,103280320", header = TRUE, sep = ",", stringsAsFactors = FALSE)

r = 2
repeat {

    if ((df$B[r] < df$B[r-1] | df$B[r] > df$B[r+1]) & df$B[r-1] < df$B[r+1]) {
        df <- df[-r,]    
    } else {
        r = r + 1
    }

    if (r == nrow(df)) break
}

df

Output: 输出:

        A         B
1   26.00  11158115
3   27.08  11881252
4   90.25  69428973
5   90.27  69749777
7   95.90  71428751
8   96.00  71670964
9  107.65 100385980
11 107.80 103280320

Explanation: 说明:

We run through each row of the dataframe from the second element (the first one will always be valid for being the first one). 我们从第二个元素开始遍历数据帧的每一行(第一个元素始终对第一个元素有效)。 Then, we delete each row with the expected criterion: the value must be higher than the previous one and lower than the next one ( (B[r] < B[r-1] or B[r] > B[r+1])) . 然后,我们使用预期的标准删除每一行:该值必须高于上一行,并低于下一行( (B[r] < B[r-1] or B[r] > B[r+1])) But with this criterion we don't get the expected result so we also verify that the subsequent value is higher than the previous one ( B[r-1] < df$B[r+1] ) 但是使用此标准,我们无法获得预期的结果,因此,我们还要验证后续值是否高于上一个值( B[r-1] < df$B[r+1]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何基于R中不同数据框中的列值从一个数据框中删除行? - How to remove rows from one dataframe based on the column values in a different data frame in R? 当列与R中的其他数据框列匹配时,如何从数据框中删除行 - How to remove rows from a data frame when the column matches with a different data frame column in R 将两列值从一个数据帧复制到一列,但在另一数据帧中复制两行 - Copy two column values from one data frame to one column but two rows in another data frame 如果指定列中有“,”,则从数据框中删除行 - Remove the rows from the data frame if there is “,” in specified column R:从一个数据框中提取行,基于列名匹配另一个数据框中的值 - R: Extract Rows from One Data Frame, Based on Column Names Matching Values from Another Data Frame 从数据框中删除其列值与另一个数据框的列值不匹配的数据 - R - remove rows from data frame whose column values don't match another data frame's column values - R 从一个数据框中选择所有列值都存在于第二个数据框中的行 - Select rows from one data frame where all column values exist in the second data frame 如何从数据框中的列中删除某些垃圾值? - How to remove certain junk values from a column in a data frame? 如何从字符向量的数据帧列的所有值中删除$? - How to remove $ from all values in a data frame column in a character vector? 如何将数据框中的一列中的 NA 值替换为不同数据框中的列中的值? - How to replace NA values in one column of a data frame, with values from a column in a different data frame?
相关标签
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM