Trying to understand dplyr function - group_by

Question

I am trying to understand the way group_by function works in dplyr . I am using the airquality data set, that comes with the datasets package link .

I understand that is if I do the following, it should arrange the records in increasing order of Temp variable

airquality_max1 <- airquality %>% arrange(Temp)

I see that is the case in airquality_max1 . I now want to arrange the records by increasing order of Temp but grouped by Month . So the end result should first have all the records for Month == 5 in increasing order of Temp . Then it should have all records of Month == 6 in increasing order of Temp and so on, so I use the following command

airquality_max2 <- airquality %>% group_by(Month) %>% arrange(Temp)

However, what I find is that the results are still in increasing order of Temp only, not grouped by Month , ie, airquality_max1 and airquality_max2 are equal.

I am not sure why the grouping by Month does not happen before the arrange function. Can anyone help me understand what I am doing wrong here?

More than the problem of trying to sort the data frame by columns, I am trying to understand the behavior of group_by as I am trying to use this to explain the application of group_by to someone.

Answer 1

arrange ignores group_by , see break-changes on dplyr 0.5.0. If you need to order by two columns, you can do:

airquality %>% arrange(Month, Temp)

For grouped data frame, you can also .by_group variable to sort by the group variable first .

airquality %>% group_by(Month) %>% arrange(Temp, .by_group = TRUE)

Trying to understand dplyr function - group_by

Question

1 answers

solution1
4 ACCPTED 2017-09-05 02:08:53

Trying to understand dplyr function - group_by

Question

1 answers

solution1 4 ACCPTED 2017-09-05 02:08:53

solution1
4 ACCPTED 2017-09-05 02:08:53