如何在 r 中按组查找基于条件的值之间的差异并使用该值创建新列？

Question

I need to calculate the thickness of the hypoxic layer (hypoxia == ODO_mgL < 2.0) by CRN.我需要通过 CRN 计算缺氧层的厚度（缺氧 == ODO_mgL < 2.0）。 I need the depth (in meters) of how thick hypoxia is for DO < 2.0 mgL (create new column for this thickness of the hypoxic layer) by CRN.我需要 CRN 的 DO < 2.0 mgL 的缺氧深度（以米为单位）（为这个缺氧层的厚度创建新列）。 For example, if I have a CRN (site) with DO > 2 from depth 0m to depth 9m and DO drops below 2 from depth 9m to depth 10m the thickness would be 1m.例如，如果我有一个 CRN（站点），其 DO > 2 从深度 0m 到深度 9m，并且 DO 从深度 9m 到深度 10m 下降到 2 以下，则厚度将为 1m。 If there is no hypoxia for a particular site then the hypoxia thickness should be 0m.如果特定部位没有缺氧，则缺氧厚度应为0m。 I need to create a new column that calculates the difference between the depth where hypoxia (DO<2) starts and the maximum depth for each unique CRN.我需要创建一个新列来计算缺氧 (DO<2) 开始的深度与每个唯一 CRN 的最大深度之间的差异。

I have spent over 4 hours trying to do this (still learning r) and feel like I am getting close, but have not figured it out.我花了 4 个多小时试图做到这一点（仍在学习 r）并且感觉我已经接近了，但还没有弄清楚。 I have looked in SO and other resources, but maybe I am not wording this correctly.我查看了 SO 和其他资源，但也许我的措辞不正确。

Example df:示例 df：

structure(list(DATE = c("8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021"), TIME = c("9:00:37 AM", "9:00:38 AM", 
"9:00:39 AM", "9:00:40 AM", "9:00:41 AM", "9:00:42 AM", "9:00:43 AM", 
"9:00:44 AM", "9:00:45 AM", "9:00:46 AM", "9:00:47 AM", "9:00:48 AM", 
"9:00:49 AM", "9:00:50 AM", "9:00:51 AM", "9:00:52 AM", "9:00:53 AM", 
"9:00:54 AM", "9:00:55 AM", "9:00:56 AM", "9:00:57 AM", "9:00:58 AM", 
"9:00:59 AM", "9:01:00 AM", "9:01:01 AM", "9:01:02 AM", "9:01:03 AM", 
"9:01:04 AM", "9:01:05 AM", "9:01:06 AM", "9:01:07 AM", "9:01:08 AM", 
"9:01:09 AM", "9:01:10 AM", "9:01:11 AM", "9:01:12 AM", "9:01:13 AM", 
"9:01:14 AM", "9:01:15 AM", "9:01:16 AM", "9:01:17 AM", "9:01:18 AM", 
"9:01:19 AM", "9:01:20 AM", "9:01:21 AM", "9:01:22 AM", "9:01:23 AM", 
"9:01:24 AM", "9:01:25 AM", "9:01:26 AM", "9:01:27 AM", "9:01:28 AM", 
"9:01:29 AM", "9:01:30 AM", "9:01:31 AM", "9:01:32 AM", "9:01:33 AM", 
"9:01:34 AM", "9:01:35 AM", "9:01:36 AM", "9:01:37 AM", "9:01:38 AM", 
"9:01:39 AM", "9:01:40 AM", "9:01:41 AM", "9:01:42 AM", "9:01:43 AM", 
"9:01:44 AM", "9:01:45 AM", "9:01:46 AM", "9:01:47 AM", "9:01:48 AM", 
"9:01:49 AM", "9:01:50 AM", "9:01:51 AM", "9:01:52 AM", "9:01:53 AM", 
"9:01:54 AM", "9:01:55 AM", "9:01:56 AM"), CRN = c(801, 801, 
801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 
801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 
801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 
801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 
801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 
801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801
), ODO_mgL = c(8.84, 8.8, 8.76, 8.72, 8.69, 8.65, 8.63, 8.59, 
8.57, 8.54, 8.49, 8.44, 8.39, 8.35, 8.31, 8.28, 8.25, 8.23, 8.21, 
8.19, 8.17, 8.15, 8.14, 8.12, 8.11, 8.1, 8.09, 7.65, 7.35, 7.1, 
7.11, 7.08, 7.01, 6.56, 6.41, 6.28, 6.22, 6.08, 5.66, 5.53, 5.38, 
5.16, 5.19, 5.1, 5.02, 4.59, 4.43, 4.39, 4.46, 4.44, 4.37, 4.31, 
4.25, 3.8, 3.71, 3.6, 3.57, 3.51, 3.47, 3.43, 3.39, 3.34, 2.95, 
2.81, 2.66, 2.59, 2.51, 2.44, 2.38, 2.32, 2.27, 1.91, 1.85, 1.78, 
1.76, 1.72, 1.7, 1.67, 1.66, 1.63), Depth_m = c(0.039, 0.043, 
0.052, 0.757, 0.678, 0.764, 0.764, 0.833, 0.837, 0.838, 0.857, 
0.893, 2.01, 2.155, 2.368, 2.368, 2.368, 2.406, 4.205, 4.299, 
4.265, 4.265, 4.252, 4.252, 4.253, 4.259, 4.27, 5.291, 5.498, 
5.387, 5.479, 5.479, 5.486, 5.513, 5.562, 5.628, 5.668, 5.722, 
5.772, 5.82, 5.855, 5.917, 5.958, 6, 6.036, 6.06, 6.102, 7.063, 
7.059, 7.035, 6.982, 6.984, 6.997, 7.032, 7.729, 7.581, 7.629, 
7.629, 7.649, 7.68, 7.756, 7.844, 7.972, 8.041, 8.1, 8.159, 8.225, 
8.316, 9.063, 9.218, 9.183, 9.137, 9.159, 9.188, 9.315, 9.52, 
9.625, 9.698, 9.754, 9.816), Vertical_Position_m = c(0.073, 0.223, 
0.662, 0.766, 0.684, 0.725, 0.892, 0.926, 0.781, 0.784, 1.013, 
1.467, 1.848, 2.035, 2.273, 2.359, 2.42, 3.132, 3.827, 4.226, 
4.254, 4.18, 4.227, 4.283, 4.272, 4.29, 4.65, 5.081, 5.352, 5.396, 
5.452, 5.488, 5.568, 5.742, 5.737, 5.761, 5.956, 6.049, 6.161, 
6.163, 6.142, 6.421, 6.426, 6.47, 6.468, 6.37, 6.81, 6.995, 7.052, 
7.033, 6.981, 7.002, 7.12, 7.518, 7.685, 7.637, 7.602, 7.674, 
7.702, 7.926, 8.146, 8.14, 8.34, 8.341, 8.475, 8.724, 8.741, 
8.952, 9.005, 9.166, 9.168, 9.135, 9.353, 9.556, 9.736, 9.874, 
9.917, 9.902, 10.15, 10.221), Temp_C = c(23, 23.0555555555556, 
23.1111111111111, 23.1666666666667, 23.2222222222222, 23.2777777777778, 
23.2777777777778, 23.3888888888889, 23.4444444444444, 23.4444444444444, 
23.5, 23.5555555555556, 23.6666666666667, 23.6666666666667, 23.7777777777778, 
23.7777777777778, 23.8333333333333, 23.8333333333333, 23.8333333333333, 
23.8333333333333, 23.8333333333333, 23.8333333333333, 23.8333333333333, 
23.8333333333333, 23.8333333333333, 23.8333333333333, 23.8333333333333, 
23.8333333333333, 23.8333333333333, 23.8333333333333, 22.5, 22.3888888888889, 
22.3888888888889, 22.2777777777778, 22.2222222222222, 22.0555555555556, 
21.9444444444444, 21.8333333333333, 21.5555555555556, 21.3888888888889, 
21.3333333333333, 21.1111111111111, 19.9444444444444, 19.8888888888889, 
19.7777777777778, 19.7777777777778, 19.7777777777778, 17.8333333333333, 
17.4444444444444, 17.3333333333333, 17.3333333333333, 17.3333333333333, 
17.3333333333333, 17.3333333333333, 17.3333333333333, 17.2777777777778, 
17.2222222222222, 17.1666666666667, 17.1111111111111, 17.0555555555556, 
17, 17, 16.8888888888889, 16.8888888888889, 16.8333333333333, 
16.7777777777778, 16.7222222222222, 16.6666666666667, 16.6666666666667, 
16.6111111111111, 16.5555555555556, 16.5, 16.4444444444444, 16.3333333333333, 
16.2777777777778, 16.2222222222222, 16.1666666666667, 16.1111111111111, 
16, 15.9444444444444)), row.names = c(NA, 80L), class = "data.frame")

I have tried the following:我尝试了以下方法：

dfso %>% 
  group_by(CRN) %>%
  summarise(thick = Depth_m[])

this just gives me the same values in the Depth_m column, so not helpful这只是在 Depth_m 列中给了我相同的值，所以没有帮助

dfso %>% 
  group_by(CRN, DO2 = ifelse(ODO_mgL < 2, "below", "above")) %>% 
  mutate(HypoxThick = Depth_m-lag(Depth_m))

this does that lag/stepwise (not sure how you call it) difference, which is not what I need这会产生滞后/逐步（不知道你怎么称呼它）的差异，这不是我需要的

dfso %>% 
  group_by(CRN, DO2thick = ifelse(ODO_mgL < 2, "below", "above")) %>% 
  mutate(HypThick = max(Depth_m))

this just gives me the maximum depth by CRN这只是给了我 CRN 的最大深度

dfso %>%
  group_by(CRN) %>%
  mutate(thick = case_when(ODO_mgL<2 ~ max(Depth_m)-Depth_m))

this is not helpful这没有帮助

dfso %>% 
  group_by(CRN, DO2 = ifelse(ODO_mgL < 2, "below", "above")) %>% 
  mutate(thick = max(Depth_m) - min(Depth_m))

This last one is extremely close to what I need.最后一个非常接近我的需要。 In this case the hypoxia thickness is 0.679m, but how do I get this value to also be applied to that ODO_mgL cutoff?在这种情况下，缺氧厚度为 0.679m，但我如何让这个值也适用于 ODO_mgL 截止值？ That whole "thick" column should have only the 0.679 value.整个“厚”列应该只有 0.679 值。

I looked at the following posts: this , this , this , this , this , and many other SO posts, and blog posts elsewhere.我查看了以下帖子： this 、 this 、 this 、 this 、 this以及许多其他 SO 帖子和其他地方的博客帖子。 Thank you for your time and help!感谢您的时间和帮助！

Answer 1

This solution assumes that your data is sorted by depth (ie that the depth where hypoxia starts is the maximum depth for that hypoxic layer).此解决方案假定您的数据按深度排序（即缺氧开始的深度是该缺氧层的最大深度）。 It goes a bit further than your last attempt in that it has thickness of 0 if the row is not in a hypoxic layer, and it should also work if you have multiple hypoxic layers in the data.它比您上次尝试更进一步，因为如果该行不在缺氧层中，它的厚度为 0，如果数据中有多个缺氧层，它也应该工作。

My approach was:我的方法是：

create a new variable hypoxic_layer which begins at 0 and increments at the start of every hypoxic layer.创建一个新的变量hypoxic_layer ，它从 0 开始，并在每个低氧层开始时递增。
then set this colum = 0 whenever the ODO_mgL is above 2, so 0 means no hypoxic layer, and all hypoxic layers get their own number然后在ODO_mgL大于2时设置此colum = 0，因此0表示没有缺氧层，所有缺氧层都有自己的编号
next group by CRN and hypoxic_layer and calculate the thickness as the difference between min() and max() of Depth_m within each group.下一组由CRN和hypoxic_layer计算thickness ，作为每组内Depth_m的min()和max()之间的差。

library(dplyr)

want <- dfso %>% 
    mutate(hypoxic_layer=cumsum(if_else(CRN==lag(CRN) & ODO_mgL<2 & lag(ODO_mgL)>=2,1,0)),       # this column increments at the beginning of each new hypoxic layer
           hypoxic_layer=if_else(ODO_mgL>=2,0,hypoxic_layer)) %>%                # set hypoxic layer = 0 if ODOmgL>=2
    group_by(CRN,hypoxic_layer) %>%                                              # group by each hypoxic layer in each CRN 
    mutate(thickness=if_else(hypoxic_layer==0,0,max(Depth_m)-min(Depth_m)))      # difference between max and min depth in each group is the thickness

如何在 r 中按组查找基于条件的值之间的差异并使用该值创建新列？

问题描述

1 个解决方案

解决方案1
0 2022-09-14 23:49:46

如何在 r 中按组查找基于条件的值之间的差异并使用该值创建新列？

问题描述

1 个解决方案

解决方案1 0 2022-09-14 23:49:46

解决方案1
0 2022-09-14 23:49:46