简体   繁体   English

如何在数据框中的变量中找到一组的最大值和最小值的差

[英]How to find the difference of max & min values in one group in a variable in a dataframe

I have three variables A, B & C in the following format 我有以下格式的三个变量A,B和C:

A         B     C
Cat1      1    NA       
Cat1      2    NA
Cat1      1    NA
Cat1      2    NA
Cat1      NA   4
Cat1      NA   1
Cat1      NA   6
Cat1      NA   4
Cat1      7    NA       
Cat1      9    NA
Cat1      3    NA
Cat1      2    NA
Cat1      NA   2
Cat1      NA   4 
Cat1      NA   5
Cat1      NA   9
.         .    .
.         .    .        
.         .    .
.         .    .

lets say in variable C wherever there are numerical values part from NA, it should be called as one Group and I have to find the difference of maximum & minimum values in that group..Can someone please help 可以说在变量C中,无论NA中是否有数值部分,都应将其称为一个组,而我必须找出该组中最大值和最小值的差。

desired Output: 所需的输出:

Sure. 当然。 The desired output be like : A Trips Value 所需的输出如下:行程值

                              Cat 1   Trip1      xx (dif of max & min of that trip)                                                       

From what I understand you could do the following 据我了解,您可以执行以下操作

library(data.table)
dt <- fread(text)
dt[, .(C = diff(range(C))), by = .(grp = rleid(is.na(C)))]
#   grp  C
#1:   1 NA
#2:   2  5
#3:   3 NA
#4:   4  7

For B and C at the same time do 对于BC同时执行

dt[, lapply(.SD, function(x) diff(range(x))), by = .(grp = rleid(is.na(C))), .SDcols = c('B', 'C')]
#   grp  B  C
#1:   1  1 NA
#2:   2 NA  5
#3:   3  7 NA
#4:   4 NA  7

Another option to remove the NA s 删除NA的另一种选择

cols <- c('B', 'C')
out <- dt[, lapply(.SD, function(x) diff(range(x))), by = rleid(is.na(C)), .SDcols = cols
          ][, lapply(.SD, na.omit), .SDcols = cols
            ][, grp := rleid(B)]
out
#   B C grp
#1: 1 5   1
#2: 7 7   2

Note that the second and third solutions assumes that B is NA when C is not et vice versa. 请注意,第二种解决方案和第三种解决方案均假定当C不为BBNA ,反之亦然。

data 数据

text <- "A         B     C
Cat1      1    NA       
Cat1      2    NA
Cat1      1    NA
Cat1      2    NA
Cat1      NA   4
Cat1      NA   1
Cat1      NA   6
Cat1      NA   4
Cat1      7    NA       
Cat1      9    NA
Cat1      3    NA
Cat1      2    NA
Cat1      NA   2
Cat1      NA   4 
Cat1      NA   5
Cat1      NA   9"

A solution using dplyr and tidyr . 使用dplyrtidyr解决方案。

library(dplyr)
library(tidyr)

dat2 <- dat %>%
  mutate(trip = cumsum(is.na(C))) %>%
  drop_na(C) %>%
  mutate(trip = group_indices(., trip)) %>%
  group_by(trip) %>%
  summarize(Diff = max(C) - min(C)) %>%
  ungroup()
dat2

# # A tibble: 2 x 2
#    trip  Diff
#   <int> <dbl>
# 1     1     5
# 2     2     7

Data 数据

dat <- read.table(text = "A         B     C
Cat1      1    NA       
                  Cat1      2    NA
                  Cat1      1    NA
                  Cat1      2    NA
                  Cat1      NA   4
                  Cat1      NA   1
                  Cat1      NA   6
                  Cat1      NA   4
                  Cat1      7    NA       
                  Cat1      9    NA
                  Cat1      3    NA
                  Cat1      2    NA
                  Cat1      NA   2
                  Cat1      NA   4 
                  Cat1      NA   5
                  Cat1      NA   9",
                  header = TRUE, stringsAsFactors = FALSE)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM