简体   繁体   中英

Dplyr giving wrong results

I am using dplyr to summarise a dataset but it's giving wrong result. My code is as bekow :-

Raw_Grp<-Raw_data%>%dplyr::group_by(as.character(Raw_data$Gardu))
                  `%>%dplyr::summarize(Avg=mean(Raw_data$Age))

Below is the str :-

data.frame':    3016 obs. of  2 variables:
 $ Kecamatan: chr  "CENGKARENG" "CENGKARENG" "CENGKARENG" "CENGKARENG" ...
 $ Age      : num  377 370 352 313 299 291 260 223 207 200 ...

Ideally I should get by group values but I am getting the total mean displayed in all the distinct groups. I have searched and tried maximum possibilities like creating a data.table but the same result. If I check the group by in excel or other toll it gives perfect results. Please help

When we use Raw_data$columnname , it extracts the entire column disrupting the group_by condition. So, the syntax would be only the column names of the interested columns

library(dplyr)
Raw_data %>% 
     group_by(Gardu) %>% 
     summarise(Avg = mean(Age))

But, there are cases when we need the entire column. For example, if we wanted to check how many elements of 'Age' within 'Gardu' are less than the whole 'Age' column values

Raw_data %>%
    group_by(Gardu) %>%
    summarise(n = sum(Age < .$Age))

data

Raw_data <- structure(list(Gardu = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 
 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L), .Label = c("a", "b", "c"), class = "factor"), 
Age = c(21L, 19L, 38L, 31L, 37L, 47L, 21L, 41L, 42L, 20L, 
34L, 25L, 37L, 37L, 23L)), class = "data.frame", row.names = c(NA, 
-15L))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM