[英]How to find the difference of max & min values in one group in a variable in a dataframe
I have three variables A, B & C in the following format 我有以下格式的三个变量A,B和C:
A B C
Cat1 1 NA
Cat1 2 NA
Cat1 1 NA
Cat1 2 NA
Cat1 NA 4
Cat1 NA 1
Cat1 NA 6
Cat1 NA 4
Cat1 7 NA
Cat1 9 NA
Cat1 3 NA
Cat1 2 NA
Cat1 NA 2
Cat1 NA 4
Cat1 NA 5
Cat1 NA 9
. . .
. . .
. . .
. . .
lets say in variable C wherever there are numerical values part from NA, it should be called as one Group and I have to find the difference of maximum & minimum values in that group..Can someone please help 可以说在变量C中,无论NA中是否有数值部分,都应将其称为一个组,而我必须找出该组中最大值和最小值的差。
desired Output: 所需的输出:
Sure. 当然。 The desired output be like : A Trips Value
所需的输出如下:行程值
Cat 1 Trip1 xx (dif of max & min of that trip)
From what I understand you could do the following 据我了解,您可以执行以下操作
library(data.table)
dt <- fread(text)
dt[, .(C = diff(range(C))), by = .(grp = rleid(is.na(C)))]
# grp C
#1: 1 NA
#2: 2 5
#3: 3 NA
#4: 4 7
For B
and C
at the same time do 对于
B
和C
同时执行
dt[, lapply(.SD, function(x) diff(range(x))), by = .(grp = rleid(is.na(C))), .SDcols = c('B', 'C')]
# grp B C
#1: 1 1 NA
#2: 2 NA 5
#3: 3 7 NA
#4: 4 NA 7
Another option to remove the NA
s 删除
NA
的另一种选择
cols <- c('B', 'C')
out <- dt[, lapply(.SD, function(x) diff(range(x))), by = rleid(is.na(C)), .SDcols = cols
][, lapply(.SD, na.omit), .SDcols = cols
][, grp := rleid(B)]
out
# B C grp
#1: 1 5 1
#2: 7 7 2
Note that the second and third solutions assumes that B
is NA
when C
is not et vice versa. 请注意,第二种解决方案和第三种解决方案均假定当
C
不为B
时B
为NA
,反之亦然。
data 数据
text <- "A B C
Cat1 1 NA
Cat1 2 NA
Cat1 1 NA
Cat1 2 NA
Cat1 NA 4
Cat1 NA 1
Cat1 NA 6
Cat1 NA 4
Cat1 7 NA
Cat1 9 NA
Cat1 3 NA
Cat1 2 NA
Cat1 NA 2
Cat1 NA 4
Cat1 NA 5
Cat1 NA 9"
A solution using dplyr
and tidyr
. 使用
dplyr
和tidyr
解决方案。
library(dplyr)
library(tidyr)
dat2 <- dat %>%
mutate(trip = cumsum(is.na(C))) %>%
drop_na(C) %>%
mutate(trip = group_indices(., trip)) %>%
group_by(trip) %>%
summarize(Diff = max(C) - min(C)) %>%
ungroup()
dat2
# # A tibble: 2 x 2
# trip Diff
# <int> <dbl>
# 1 1 5
# 2 2 7
Data 数据
dat <- read.table(text = "A B C
Cat1 1 NA
Cat1 2 NA
Cat1 1 NA
Cat1 2 NA
Cat1 NA 4
Cat1 NA 1
Cat1 NA 6
Cat1 NA 4
Cat1 7 NA
Cat1 9 NA
Cat1 3 NA
Cat1 2 NA
Cat1 NA 2
Cat1 NA 4
Cat1 NA 5
Cat1 NA 9",
header = TRUE, stringsAsFactors = FALSE)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.