简体   繁体   English

R:按一列分组,然后在其他任何列中返回值大于0的第一行,然后返回此行之后的所有行

[英]R: Group by one column, and return the first row that has a value greater than 0 in any of the other columns and then return all rows after this row

在此处输入图片说明

I'm new to R programming and hope someone could help me with the situation below: 我是R编程的新手,希望有人可以帮助我解决以下情况:

I have a dataframe shown in the picture (Original Dataframe), I would like to return the first record grouped by the [ID] column that has a value >= 1 in any of the four columns (A, B, C, or D) and all the records after based off the [Date] column (the desired dataframe should look like the Output Dataframe shown in the picture). 我有一个显示在图片中的数据框(原始数据框),我想返回由[ID]列分组的第一条记录,该记录在四个列(A,B,C或D中的任何一个中,值> = 1) )以及基于[日期]列的所有记录(所需的数据框应类似于图片所示的输出数据框)。 Basically, remove all the records highlighted in yellow. 基本上,删除所有以黄色突出显示的记录。 I would appreciate greatly if you can provide the R code to achieve this. 如果可以提供R代码来实现此目标,我将不胜感激。

structure(list(ID = c(101L, 101L, 101L, 101L, 101L, 101L, 103L, 
103L, 103L, 103L), Date = c(43338L, 43306L, 43232L, 43268L, 43183L, 
43144L, 43310L, 43246L, 43264L, 43209L), A = c(0L, 0L, 0L, 0L, 
0L, 0L, 0L, 1L, 0L, 0L), B = c(0L, 2L, 0L, 0L, 0L, 0L, 0L, 1L, 
0L, 0L), C = c(0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), D = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("ID", "Date", 
"A", "B", "C", "D"), row.names = c(NA, -10L), class = c("data.table", 
"data.frame"))

Here is a solution, 这是一个解决方案,

    ID       Date A B C D
1  101 26.08.2018 0 0 0 0
2  101 25.07.2018 0 2 0 0
3  101 12.05.2018 0 0 1 0
4  101 17.06.2018 0 0 0 0
5  101 24.03.2018 0 0 0 0
6  101 13.02.2018 0 0 0 0
7  103 29.07.2018 0 0 0 0
8  103 26.05.2018 1 1 0 0
9  103 13.06.2018 0 0 0 0
10 103 19.04.2018 0 0 0 0


data$Check <- rowSums(data[3:6]) 

data$Date <- as.Date(data$Date , "%d.%m.%Y")


data <- data[order(data$ID,data$Date),]


id <- unique(data$ID)

for(i in 1:length(id)) {

    data_sample <- data[data$ID == id[i],]

    data_sample <- data_sample[ min(which(data_sample$Check>0 )):nrow(data_sample),]

    if(i==1) {

        final <- data_sample


    } else {

        final <- rbind(final,data_sample)

    }

}

final <- final[,-7]

   ID       Date A B C D
3 101 2018-05-12 0 0 1 0
4 101 2018-06-17 0 0 0 0
2 101 2018-07-25 0 2 0 0
1 101 2018-08-26 0 0 0 0
8 103 2018-05-26 1 1 0 0
9 103 2018-06-13 0 0 0 0
7 103 2018-07-29 0 0 0 0

Here's a tidyverse solution. 这是一个tidyverse解决方案。 The filter condition deserves some explanation: filter条件值得一些解释:

  1. first, we sort by ID and Date and group_by ID 首先,我们按IDDate以及group_by ID排序
  2. Then, for each ID (since we're grouped by ID) we apply the filter condition: 然后,对于每个ID(因为我们按ID分组),我们应用了过滤条件:
    1. Test, for each row, whether any of the variables are > 0 测试每一行是否有任何变量> 0
    2. Get the row number for all rows (in the group) where this is the case 在这种情况下,获取(组中)所有行的行号
    3. Find the lowest one (since rows are sorted by Date, this will be the earliest) 找到最低的行(因为行按日期排序,这将是最早的行)
    4. Get the value of Date for that row. 获取该行的Date值。
    5. Then filter rows where Date is >= than this. 然后,其中过滤行Date>=比这个。

Since we're still grouping by ID , all these calculations will happen separately for each group: 由于我们仍按ID分组,因此所有这些计算将分别针对每个组进行:

df %>%
    arrange(ID, Date) %>%
    group_by(ID) %>%
    filter(Date >= Date[min(which(A > 0 | B > 0 | C > 0 | D > 0))])

# A tibble: 7 x 6
# Groups:   ID [2]
     ID  Date     A     B     C     D
  <int> <int> <int> <int> <int> <int>
1   101 43232     0     0     1     0
2   101 43268     0     0     0     0
3   101 43306     0     2     0     0
4   101 43338     0     0     0     0
5   103 43246     1     1     0     0
6   103 43264     0     0     0     0
7   103 43310     0     0     0     0

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 R:返回前“ n”行并将其余行分组为“其他”行并汇总该列 - R: Return the first “n” rows and group the remaining rows into “Other” row and summarise the column 在 R 中,对于除一个之外的所有列,从其自身中减去列中的第一行以及该列中的所有行 - Subtract first row in a column from itself and all rows in that column, for all columns but one, in R 我如何在 integer 列上按“x 或更大”分组并返回“组”integer 和该行中的其他值? - How can I group_by "x or greater" on an integer column and return the "group" integer and other values from that row? 如何使用R将包含一列相同值但其他列不同的行转换为一行? - How to convert rows that contain same value for one column but different for other columns into one single row using R? R - 返回找到第一个给定值的行的列名 - R - Return column name for row where first given value is found 对于 R 中的多列,如何按组从另一行值中减去另一行值并将值分配给同一列中的不同行 - How to subtract one row value from another and assign the value to a different row all in the same column, by group, for multiple columns in R 返回一个列表,其中包含 R 中每个矩阵行的大于 N 的所有值 - Return a list containing all values greater than N of each matrix row in R 如果一行在R列中共享另一行的值并且在R中的另一列中具有一个值,如何删除该行? - How to delete a row if it shares the value of another row in one column and has one value in other column in R? if/else 测试列中的任何行是否大于绝对值(R studio) - if/else test if ANY row in a column is greater than an absolute value (R studio) 查找R中行值大于零的列索引 - Find column index where row value is greater than zero in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM