简体   繁体   中英

How to find difference between values based on condition by group in r and create new column with that value?

I need to calculate the thickness of the hypoxic layer (hypoxia == ODO_mgL < 2.0) by CRN. I need the depth (in meters) of how thick hypoxia is for DO < 2.0 mgL (create new column for this thickness of the hypoxic layer) by CRN. For example, if I have a CRN (site) with DO > 2 from depth 0m to depth 9m and DO drops below 2 from depth 9m to depth 10m the thickness would be 1m. If there is no hypoxia for a particular site then the hypoxia thickness should be 0m. I need to create a new column that calculates the difference between the depth where hypoxia (DO<2) starts and the maximum depth for each unique CRN.

I have spent over 4 hours trying to do this (still learning r) and feel like I am getting close, but have not figured it out. I have looked in SO and other resources, but maybe I am not wording this correctly.

Example df:

structure(list(DATE = c("8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021"), TIME = c("9:00:37 AM", "9:00:38 AM", 
"9:00:39 AM", "9:00:40 AM", "9:00:41 AM", "9:00:42 AM", "9:00:43 AM", 
"9:00:44 AM", "9:00:45 AM", "9:00:46 AM", "9:00:47 AM", "9:00:48 AM", 
"9:00:49 AM", "9:00:50 AM", "9:00:51 AM", "9:00:52 AM", "9:00:53 AM", 
"9:00:54 AM", "9:00:55 AM", "9:00:56 AM", "9:00:57 AM", "9:00:58 AM", 
"9:00:59 AM", "9:01:00 AM", "9:01:01 AM", "9:01:02 AM", "9:01:03 AM", 
"9:01:04 AM", "9:01:05 AM", "9:01:06 AM", "9:01:07 AM", "9:01:08 AM", 
"9:01:09 AM", "9:01:10 AM", "9:01:11 AM", "9:01:12 AM", "9:01:13 AM", 
"9:01:14 AM", "9:01:15 AM", "9:01:16 AM", "9:01:17 AM", "9:01:18 AM", 
"9:01:19 AM", "9:01:20 AM", "9:01:21 AM", "9:01:22 AM", "9:01:23 AM", 
"9:01:24 AM", "9:01:25 AM", "9:01:26 AM", "9:01:27 AM", "9:01:28 AM", 
"9:01:29 AM", "9:01:30 AM", "9:01:31 AM", "9:01:32 AM", "9:01:33 AM", 
"9:01:34 AM", "9:01:35 AM", "9:01:36 AM", "9:01:37 AM", "9:01:38 AM", 
"9:01:39 AM", "9:01:40 AM", "9:01:41 AM", "9:01:42 AM", "9:01:43 AM", 
"9:01:44 AM", "9:01:45 AM", "9:01:46 AM", "9:01:47 AM", "9:01:48 AM", 
"9:01:49 AM", "9:01:50 AM", "9:01:51 AM", "9:01:52 AM", "9:01:53 AM", 
"9:01:54 AM", "9:01:55 AM", "9:01:56 AM"), CRN = c(801, 801, 
801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 
801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 
801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 
801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 
801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 
801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801
), ODO_mgL = c(8.84, 8.8, 8.76, 8.72, 8.69, 8.65, 8.63, 8.59, 
8.57, 8.54, 8.49, 8.44, 8.39, 8.35, 8.31, 8.28, 8.25, 8.23, 8.21, 
8.19, 8.17, 8.15, 8.14, 8.12, 8.11, 8.1, 8.09, 7.65, 7.35, 7.1, 
7.11, 7.08, 7.01, 6.56, 6.41, 6.28, 6.22, 6.08, 5.66, 5.53, 5.38, 
5.16, 5.19, 5.1, 5.02, 4.59, 4.43, 4.39, 4.46, 4.44, 4.37, 4.31, 
4.25, 3.8, 3.71, 3.6, 3.57, 3.51, 3.47, 3.43, 3.39, 3.34, 2.95, 
2.81, 2.66, 2.59, 2.51, 2.44, 2.38, 2.32, 2.27, 1.91, 1.85, 1.78, 
1.76, 1.72, 1.7, 1.67, 1.66, 1.63), Depth_m = c(0.039, 0.043, 
0.052, 0.757, 0.678, 0.764, 0.764, 0.833, 0.837, 0.838, 0.857, 
0.893, 2.01, 2.155, 2.368, 2.368, 2.368, 2.406, 4.205, 4.299, 
4.265, 4.265, 4.252, 4.252, 4.253, 4.259, 4.27, 5.291, 5.498, 
5.387, 5.479, 5.479, 5.486, 5.513, 5.562, 5.628, 5.668, 5.722, 
5.772, 5.82, 5.855, 5.917, 5.958, 6, 6.036, 6.06, 6.102, 7.063, 
7.059, 7.035, 6.982, 6.984, 6.997, 7.032, 7.729, 7.581, 7.629, 
7.629, 7.649, 7.68, 7.756, 7.844, 7.972, 8.041, 8.1, 8.159, 8.225, 
8.316, 9.063, 9.218, 9.183, 9.137, 9.159, 9.188, 9.315, 9.52, 
9.625, 9.698, 9.754, 9.816), Vertical_Position_m = c(0.073, 0.223, 
0.662, 0.766, 0.684, 0.725, 0.892, 0.926, 0.781, 0.784, 1.013, 
1.467, 1.848, 2.035, 2.273, 2.359, 2.42, 3.132, 3.827, 4.226, 
4.254, 4.18, 4.227, 4.283, 4.272, 4.29, 4.65, 5.081, 5.352, 5.396, 
5.452, 5.488, 5.568, 5.742, 5.737, 5.761, 5.956, 6.049, 6.161, 
6.163, 6.142, 6.421, 6.426, 6.47, 6.468, 6.37, 6.81, 6.995, 7.052, 
7.033, 6.981, 7.002, 7.12, 7.518, 7.685, 7.637, 7.602, 7.674, 
7.702, 7.926, 8.146, 8.14, 8.34, 8.341, 8.475, 8.724, 8.741, 
8.952, 9.005, 9.166, 9.168, 9.135, 9.353, 9.556, 9.736, 9.874, 
9.917, 9.902, 10.15, 10.221), Temp_C = c(23, 23.0555555555556, 
23.1111111111111, 23.1666666666667, 23.2222222222222, 23.2777777777778, 
23.2777777777778, 23.3888888888889, 23.4444444444444, 23.4444444444444, 
23.5, 23.5555555555556, 23.6666666666667, 23.6666666666667, 23.7777777777778, 
23.7777777777778, 23.8333333333333, 23.8333333333333, 23.8333333333333, 
23.8333333333333, 23.8333333333333, 23.8333333333333, 23.8333333333333, 
23.8333333333333, 23.8333333333333, 23.8333333333333, 23.8333333333333, 
23.8333333333333, 23.8333333333333, 23.8333333333333, 22.5, 22.3888888888889, 
22.3888888888889, 22.2777777777778, 22.2222222222222, 22.0555555555556, 
21.9444444444444, 21.8333333333333, 21.5555555555556, 21.3888888888889, 
21.3333333333333, 21.1111111111111, 19.9444444444444, 19.8888888888889, 
19.7777777777778, 19.7777777777778, 19.7777777777778, 17.8333333333333, 
17.4444444444444, 17.3333333333333, 17.3333333333333, 17.3333333333333, 
17.3333333333333, 17.3333333333333, 17.3333333333333, 17.2777777777778, 
17.2222222222222, 17.1666666666667, 17.1111111111111, 17.0555555555556, 
17, 17, 16.8888888888889, 16.8888888888889, 16.8333333333333, 
16.7777777777778, 16.7222222222222, 16.6666666666667, 16.6666666666667, 
16.6111111111111, 16.5555555555556, 16.5, 16.4444444444444, 16.3333333333333, 
16.2777777777778, 16.2222222222222, 16.1666666666667, 16.1111111111111, 
16, 15.9444444444444)), row.names = c(NA, 80L), class = "data.frame")

I have tried the following:

dfso %>% 
  group_by(CRN) %>%
  summarise(thick = Depth_m[])

this just gives me the same values in the Depth_m column, so not helpful

dfso %>% 
  group_by(CRN, DO2 = ifelse(ODO_mgL < 2, "below", "above")) %>% 
  mutate(HypoxThick = Depth_m-lag(Depth_m))

this does that lag/stepwise (not sure how you call it) difference, which is not what I need

dfso %>% 
  group_by(CRN, DO2thick = ifelse(ODO_mgL < 2, "below", "above")) %>% 
  mutate(HypThick = max(Depth_m))

this just gives me the maximum depth by CRN

dfso %>%
  group_by(CRN) %>%
  mutate(thick = case_when(ODO_mgL<2 ~ max(Depth_m)-Depth_m))

this is not helpful

dfso %>% 
  group_by(CRN, DO2 = ifelse(ODO_mgL < 2, "below", "above")) %>% 
  mutate(thick = max(Depth_m) - min(Depth_m))

This last one is extremely close to what I need. In this case the hypoxia thickness is 0.679m, but how do I get this value to also be applied to that ODO_mgL cutoff? That whole "thick" column should have only the 0.679 value.

I looked at the following posts: this , this , this , this , this , and many other SO posts, and blog posts elsewhere. Thank you for your time and help!

This solution assumes that your data is sorted by depth (ie that the depth where hypoxia starts is the maximum depth for that hypoxic layer). It goes a bit further than your last attempt in that it has thickness of 0 if the row is not in a hypoxic layer, and it should also work if you have multiple hypoxic layers in the data.

My approach was:

  • create a new variable hypoxic_layer which begins at 0 and increments at the start of every hypoxic layer.
  • then set this colum = 0 whenever the ODO_mgL is above 2, so 0 means no hypoxic layer, and all hypoxic layers get their own number
  • next group by CRN and hypoxic_layer and calculate the thickness as the difference between min() and max() of Depth_m within each group.
library(dplyr)

want <- dfso %>% 
    mutate(hypoxic_layer=cumsum(if_else(CRN==lag(CRN) & ODO_mgL<2 & lag(ODO_mgL)>=2,1,0)),       # this column increments at the beginning of each new hypoxic layer
           hypoxic_layer=if_else(ODO_mgL>=2,0,hypoxic_layer)) %>%                # set hypoxic layer = 0 if ODOmgL>=2
    group_by(CRN,hypoxic_layer) %>%                                              # group by each hypoxic layer in each CRN 
    mutate(thickness=if_else(hypoxic_layer==0,0,max(Depth_m)-min(Depth_m)))      # difference between max and min depth in each group is the thickness

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM