I am trying to understand the way group_by
function works in dplyr
. I am using the airquality
data set, that comes with the datasets
package link .
I understand that is if I do the following, it should arrange the records in increasing order of Temp
variable
airquality_max1 <- airquality %>% arrange(Temp)
I see that is the case in airquality_max1
. I now want to arrange the records by increasing order of Temp
but grouped by Month
. So the end result should first have all the records for Month == 5
in increasing order of Temp
. Then it should have all records of Month == 6
in increasing order of Temp
and so on, so I use the following command
airquality_max2 <- airquality %>% group_by(Month) %>% arrange(Temp)
However, what I find is that the results are still in increasing order of Temp
only, not grouped by Month
, ie, airquality_max1
and airquality_max2
are equal.
I am not sure why the grouping by Month
does not happen before the arrange
function. Can anyone help me understand what I am doing wrong here?
More than the problem of trying to sort the data frame by columns, I am trying to understand the behavior of group_by
as I am trying to use this to explain the application of group_by
to someone.
arrange
ignores group_by
, see break-changes on dplyr 0.5.0. If you need to order by two columns, you can do:
airquality %>% arrange(Month, Temp)
For grouped data frame, you can also .by_group
variable to sort by the group variable first .
airquality %>% group_by(Month) %>% arrange(Temp, .by_group = TRUE)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.