简体   繁体   English

如何在 r 中按组查找基于条件的值之间的差异并使用该值创建新列?

[英]How to find difference between values based on condition by group in r and create new column with that value?

I need to calculate the thickness of the hypoxic layer (hypoxia == ODO_mgL < 2.0) by CRN.我需要通过 CRN 计算缺氧层的厚度(缺氧 == ODO_mgL < 2.0)。 I need the depth (in meters) of how thick hypoxia is for DO < 2.0 mgL (create new column for this thickness of the hypoxic layer) by CRN.我需要 CRN 的 DO < 2.0 mgL 的缺氧深度(以米为单位)(为这个缺氧层的厚度创建新列)。 For example, if I have a CRN (site) with DO > 2 from depth 0m to depth 9m and DO drops below 2 from depth 9m to depth 10m the thickness would be 1m.例如,如果我有一个 CRN(站点),其 DO > 2 从深度 0m 到深度 9m,并且 DO 从深度 9m 到深度 10m 下降到 2 以下,则厚度将为 1m。 If there is no hypoxia for a particular site then the hypoxia thickness should be 0m.如果特定部位没有缺氧,则缺氧厚度应为0m。 I need to create a new column that calculates the difference between the depth where hypoxia (DO<2) starts and the maximum depth for each unique CRN.我需要创建一个新列来计算缺氧 (DO<2) 开始的深度与每个唯一 CRN 的最大深度之间的差异。

I have spent over 4 hours trying to do this (still learning r) and feel like I am getting close, but have not figured it out.我花了 4 个多小时试图做到这一点(仍在学习 r)并且感觉我已经接近了,但还没有弄清楚。 I have looked in SO and other resources, but maybe I am not wording this correctly.我查看了 SO 和其他资源,但也许我的措辞不正确。

Example df:示例 df:

structure(list(DATE = c("8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021"), TIME = c("9:00:37 AM", "9:00:38 AM", 
"9:00:39 AM", "9:00:40 AM", "9:00:41 AM", "9:00:42 AM", "9:00:43 AM", 
"9:00:44 AM", "9:00:45 AM", "9:00:46 AM", "9:00:47 AM", "9:00:48 AM", 
"9:00:49 AM", "9:00:50 AM", "9:00:51 AM", "9:00:52 AM", "9:00:53 AM", 
"9:00:54 AM", "9:00:55 AM", "9:00:56 AM", "9:00:57 AM", "9:00:58 AM", 
"9:00:59 AM", "9:01:00 AM", "9:01:01 AM", "9:01:02 AM", "9:01:03 AM", 
"9:01:04 AM", "9:01:05 AM", "9:01:06 AM", "9:01:07 AM", "9:01:08 AM", 
"9:01:09 AM", "9:01:10 AM", "9:01:11 AM", "9:01:12 AM", "9:01:13 AM", 
"9:01:14 AM", "9:01:15 AM", "9:01:16 AM", "9:01:17 AM", "9:01:18 AM", 
"9:01:19 AM", "9:01:20 AM", "9:01:21 AM", "9:01:22 AM", "9:01:23 AM", 
"9:01:24 AM", "9:01:25 AM", "9:01:26 AM", "9:01:27 AM", "9:01:28 AM", 
"9:01:29 AM", "9:01:30 AM", "9:01:31 AM", "9:01:32 AM", "9:01:33 AM", 
"9:01:34 AM", "9:01:35 AM", "9:01:36 AM", "9:01:37 AM", "9:01:38 AM", 
"9:01:39 AM", "9:01:40 AM", "9:01:41 AM", "9:01:42 AM", "9:01:43 AM", 
"9:01:44 AM", "9:01:45 AM", "9:01:46 AM", "9:01:47 AM", "9:01:48 AM", 
"9:01:49 AM", "9:01:50 AM", "9:01:51 AM", "9:01:52 AM", "9:01:53 AM", 
"9:01:54 AM", "9:01:55 AM", "9:01:56 AM"), CRN = c(801, 801, 
801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 
801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 
801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 
801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 
801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 
801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801
), ODO_mgL = c(8.84, 8.8, 8.76, 8.72, 8.69, 8.65, 8.63, 8.59, 
8.57, 8.54, 8.49, 8.44, 8.39, 8.35, 8.31, 8.28, 8.25, 8.23, 8.21, 
8.19, 8.17, 8.15, 8.14, 8.12, 8.11, 8.1, 8.09, 7.65, 7.35, 7.1, 
7.11, 7.08, 7.01, 6.56, 6.41, 6.28, 6.22, 6.08, 5.66, 5.53, 5.38, 
5.16, 5.19, 5.1, 5.02, 4.59, 4.43, 4.39, 4.46, 4.44, 4.37, 4.31, 
4.25, 3.8, 3.71, 3.6, 3.57, 3.51, 3.47, 3.43, 3.39, 3.34, 2.95, 
2.81, 2.66, 2.59, 2.51, 2.44, 2.38, 2.32, 2.27, 1.91, 1.85, 1.78, 
1.76, 1.72, 1.7, 1.67, 1.66, 1.63), Depth_m = c(0.039, 0.043, 
0.052, 0.757, 0.678, 0.764, 0.764, 0.833, 0.837, 0.838, 0.857, 
0.893, 2.01, 2.155, 2.368, 2.368, 2.368, 2.406, 4.205, 4.299, 
4.265, 4.265, 4.252, 4.252, 4.253, 4.259, 4.27, 5.291, 5.498, 
5.387, 5.479, 5.479, 5.486, 5.513, 5.562, 5.628, 5.668, 5.722, 
5.772, 5.82, 5.855, 5.917, 5.958, 6, 6.036, 6.06, 6.102, 7.063, 
7.059, 7.035, 6.982, 6.984, 6.997, 7.032, 7.729, 7.581, 7.629, 
7.629, 7.649, 7.68, 7.756, 7.844, 7.972, 8.041, 8.1, 8.159, 8.225, 
8.316, 9.063, 9.218, 9.183, 9.137, 9.159, 9.188, 9.315, 9.52, 
9.625, 9.698, 9.754, 9.816), Vertical_Position_m = c(0.073, 0.223, 
0.662, 0.766, 0.684, 0.725, 0.892, 0.926, 0.781, 0.784, 1.013, 
1.467, 1.848, 2.035, 2.273, 2.359, 2.42, 3.132, 3.827, 4.226, 
4.254, 4.18, 4.227, 4.283, 4.272, 4.29, 4.65, 5.081, 5.352, 5.396, 
5.452, 5.488, 5.568, 5.742, 5.737, 5.761, 5.956, 6.049, 6.161, 
6.163, 6.142, 6.421, 6.426, 6.47, 6.468, 6.37, 6.81, 6.995, 7.052, 
7.033, 6.981, 7.002, 7.12, 7.518, 7.685, 7.637, 7.602, 7.674, 
7.702, 7.926, 8.146, 8.14, 8.34, 8.341, 8.475, 8.724, 8.741, 
8.952, 9.005, 9.166, 9.168, 9.135, 9.353, 9.556, 9.736, 9.874, 
9.917, 9.902, 10.15, 10.221), Temp_C = c(23, 23.0555555555556, 
23.1111111111111, 23.1666666666667, 23.2222222222222, 23.2777777777778, 
23.2777777777778, 23.3888888888889, 23.4444444444444, 23.4444444444444, 
23.5, 23.5555555555556, 23.6666666666667, 23.6666666666667, 23.7777777777778, 
23.7777777777778, 23.8333333333333, 23.8333333333333, 23.8333333333333, 
23.8333333333333, 23.8333333333333, 23.8333333333333, 23.8333333333333, 
23.8333333333333, 23.8333333333333, 23.8333333333333, 23.8333333333333, 
23.8333333333333, 23.8333333333333, 23.8333333333333, 22.5, 22.3888888888889, 
22.3888888888889, 22.2777777777778, 22.2222222222222, 22.0555555555556, 
21.9444444444444, 21.8333333333333, 21.5555555555556, 21.3888888888889, 
21.3333333333333, 21.1111111111111, 19.9444444444444, 19.8888888888889, 
19.7777777777778, 19.7777777777778, 19.7777777777778, 17.8333333333333, 
17.4444444444444, 17.3333333333333, 17.3333333333333, 17.3333333333333, 
17.3333333333333, 17.3333333333333, 17.3333333333333, 17.2777777777778, 
17.2222222222222, 17.1666666666667, 17.1111111111111, 17.0555555555556, 
17, 17, 16.8888888888889, 16.8888888888889, 16.8333333333333, 
16.7777777777778, 16.7222222222222, 16.6666666666667, 16.6666666666667, 
16.6111111111111, 16.5555555555556, 16.5, 16.4444444444444, 16.3333333333333, 
16.2777777777778, 16.2222222222222, 16.1666666666667, 16.1111111111111, 
16, 15.9444444444444)), row.names = c(NA, 80L), class = "data.frame")

I have tried the following:我尝试了以下方法:

dfso %>% 
  group_by(CRN) %>%
  summarise(thick = Depth_m[])

this just gives me the same values in the Depth_m column, so not helpful这只是在 Depth_m 列中给了我相同的值,所以没有帮助

dfso %>% 
  group_by(CRN, DO2 = ifelse(ODO_mgL < 2, "below", "above")) %>% 
  mutate(HypoxThick = Depth_m-lag(Depth_m))

this does that lag/stepwise (not sure how you call it) difference, which is not what I need这会产生滞后/逐步(不知道你怎么称呼它)的差异,这不是我需要的

dfso %>% 
  group_by(CRN, DO2thick = ifelse(ODO_mgL < 2, "below", "above")) %>% 
  mutate(HypThick = max(Depth_m))

this just gives me the maximum depth by CRN这只是给了我 CRN 的最大深度

dfso %>%
  group_by(CRN) %>%
  mutate(thick = case_when(ODO_mgL<2 ~ max(Depth_m)-Depth_m))

this is not helpful这没有帮助

dfso %>% 
  group_by(CRN, DO2 = ifelse(ODO_mgL < 2, "below", "above")) %>% 
  mutate(thick = max(Depth_m) - min(Depth_m))

This last one is extremely close to what I need.最后一个非常接近我的需要。 In this case the hypoxia thickness is 0.679m, but how do I get this value to also be applied to that ODO_mgL cutoff?在这种情况下,缺氧厚度为 0.679m,但我如何让这个值也适用于 ODO_mgL 截止值? That whole "thick" column should have only the 0.679 value.整个“厚”列应该只有 0.679 值。

I looked at the following posts: this , this , this , this , this , and many other SO posts, and blog posts elsewhere.我查看了以下帖子: thisthisthisthisthis以及许多其他 SO 帖子和其他地方的博客帖子。 Thank you for your time and help!感谢您的时间和帮助!

This solution assumes that your data is sorted by depth (ie that the depth where hypoxia starts is the maximum depth for that hypoxic layer).此解决方案假定您的数据按深度排序(即缺氧开始的深度是该缺氧层的最大深度)。 It goes a bit further than your last attempt in that it has thickness of 0 if the row is not in a hypoxic layer, and it should also work if you have multiple hypoxic layers in the data.它比您上次尝试更进一步,因为如果该行不在缺氧层中,它的厚度为 0,如果数据中有多个缺氧层,它也应该工作。

My approach was:我的方法是:

  • create a new variable hypoxic_layer which begins at 0 and increments at the start of every hypoxic layer.创建一个新的变量hypoxic_layer ,它从 0 开始,并在每个低氧层开始时递增。
  • then set this colum = 0 whenever the ODO_mgL is above 2, so 0 means no hypoxic layer, and all hypoxic layers get their own number然后在ODO_mgL大于2时设置此colum = 0,因此0表示没有缺氧层,所有缺氧层都有自己的编号
  • next group by CRN and hypoxic_layer and calculate the thickness as the difference between min() and max() of Depth_m within each group.下一组由CRNhypoxic_layer计算thickness ,作为每组内Depth_mmin()max()之间的差。
library(dplyr)

want <- dfso %>% 
    mutate(hypoxic_layer=cumsum(if_else(CRN==lag(CRN) & ODO_mgL<2 & lag(ODO_mgL)>=2,1,0)),       # this column increments at the beginning of each new hypoxic layer
           hypoxic_layer=if_else(ODO_mgL>=2,0,hypoxic_layer)) %>%                # set hypoxic layer = 0 if ODOmgL>=2
    group_by(CRN,hypoxic_layer) %>%                                              # group by each hypoxic layer in each CRN 
    mutate(thickness=if_else(hypoxic_layer==0,0,max(Depth_m)-min(Depth_m)))      # difference between max and min depth in each group is the thickness

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM