简体   繁体   中英

R and dplyr, using group_by to run code per group not working

First of all I'm quite new to R so I may be off the mark in my understanding of what is happening here, but I'm stuck on this piece of code and I need it fixed quickly so thank you for your time and effort preemptively.

I'm trying to find a freezing point per route per year, essentially this will happen when the CT value passes the threshold of 9. The thing is since I'm working with Arctic data, the CT value will start off being above 9 and I have to find where it first passes the threshold from being below to above 9. Maybe there are functions for this sort of local min but I don't know what they are.

I tried making a long pipe statement but I was having some trouble in referencing columns so I attempted to group_by outside of the pipe statement but that didn't work either.

EDIT: Here is a sample. I would like to end up with 1 value (Day of Year) for East 1983 and East 1984. The correct returned values are 6 and 18 respectively.

Route Year  Day_Year    CT
East  1983  1           3
East  1983  2           2
East  1983  3           1
East  1983  4           0
East  1983  5           2
East  1983  6           9.5
East  1984  1           3   
East  1984  3           2
East  1984  9           1
East  1984  10          0
East  1984  14          2
East  1984  18          9.5


library("dplyr")
data_g <- group_by(Sea_Ice, Route, Year)

#Above 9 Freeze-Up
Above_9_A <- 
  #group_by(Sea_Ice, Route, Year) %>%
  data_g %>%
  mutate(row.position = which.min(data_g$CT))%>%
  filter(CT > 9, !SA %in% c("New Ice", "Nilas", "Grey Ice", "Open Water")) %>%
  slice(which.min(Day_Year)) %>%
  mutate(Conc_Threshold = "Above_9")

What I'm currently doing is resulting in finding the minimum for ALL routes over ALL years.

I just have no idea where to go from here, thank you for your help.

EDIT 2: I've removed the filters for the other columns for now, as it isn't part of my issue

What you need to do is create a column that will be TRUE when there has both been a previous number below 9 AND the current number is above 9. This is how you can do this:

data_g %>% group_by(route, year) %>% 
  mutate(freezepoint=(cumsum(CT<9)>0 & CT>=9)) %>% 
  filter(freezepoint)

Or, more directly:

data_g %>% group_by(route,year) %>% slice(which.max(cumsum(CT<9)>0 & CT>=9))

(note: this assumes that the data.frame is arranged by day already)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM