简体   繁体   中英

Summarizing and spreading data

I have data similar to below :

df=data.frame(
company=c("McD","McD","McD","KFC","KFC"),
Title=c("Crew Member","Manager","Trainer","Crew Member","Manager"),
Manhours=c(12,NA,5,13,10)
)
df

I would wish to manipulate it and obtain the data frame as below:

 df=data.frame(
   company=c("KFC", "McD"),
   Manager=c(1,1),
   Surbodinate=c(1,2),
   TotalEmp=c(2,3),
   TotalHours=c(23,17)  
  )

I have managed to manipulate and categorise the employees as well as their count as below:

df<- df %>%
   mutate(Role = if_else((Title=="Manager" ),
                         "Manager","Surbodinate"))%>%  
   count(company,  Role) %>%  
   spread(Role, n, fill=0)%>%
   as.data.frame() %>%
   mutate(TotalEmp= select(., Manager:Surbodinate) %>% 
       apply(1, sum, na.rm=TRUE))

Also, I have summarised the man hours as below:

df <- df %>%group_by(company) %>%
    summarize(TotalHours = sum(Manhours, na.rm = TRUE))

How would I combine these two steps at once or is there a cleaner/simpler way of getting the desired output?

dplyr solution:

df %>%
    mutate(Title = if_else((Title=="Manager" ),
                          "Manager","Surbodinate")) %>%
    group_by(company) %>%
    summarise(Manager = sum(Title == "Manager"), Subordinate = sum(Title == "Surbodinate"), TotalEmp = n(), Manhours = sum(Manhours, na.rm = TRUE))

  company Manager Subordinate TotalEmp Manhours
  <fct>     <int>       <int>    <int>    <dbl>
1 KFC           1           1        2       23
2 McD           1           2        3       17

how about something like this:

df %>%
  mutate(Role = ifelse(Title=="Manager" ,
                        "Manager", "Surbodinate"))%>%  
  group_by(company) %>% 
  mutate(TotalEmp = n(), 
         TotalHours = sum(Manhours, na.rm=TRUE)) %>%  
  reshape2::dcast(company + TotalEmp + TotalHours ~ Role)

This is not tidyverse nor is it a one step process. But if you use data.table you could do:

library(data.table)
setDT(df, key = "company")

totals <- DT[, .(TotalEmp = .N, TotalHours = sum(Manhours, na.rm = TRUE)), by = company]
dcast(DT, company ~ ifelse(Title == "Manager", "Manager", "Surbodinate"))[totals]

#   company Manager Surbodinate TotalEmp TotalHours
# 1     KFC       1           1        2         23
# 2     McD       1           2        3         17

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM