简体   繁体   中英

R: Show Groups with highest mean per Variable

I have a Dataset with satisfaction scores (0-5) from airline passengers regarding multiple categories like cleanliness, seat comfort, gate location, etc.. The dataset also includes info about class, type of travel, age, and so on.

I want to find out wether business class travelers are (on average) more satisfied in every category than economy class travelers.

I know that I can just check for the mean satisfaction scores of category A...n, grouped by class. (see below)

library(dplyr)

final_dataset %>%
  group_by(Class) %>%
  summarise_at(vars(Cleanliness), list(mean = mean))

That way I will know what the mean for the different classes is for a given category. I've tried that and it works. This is a lot of effort though and doesn't really look great. There has to be a better way so I can see a list of categories and which class is most satisfied, right?

Class is a factor (find the code below), while the satisfaction scores are doubles.

final_dataset$Class <- as.factor(final_dataset$Class) 

I've tried this (but it didn't work. Don't even exactly know, what it does):

library( data.table )
setDT( final_dataset )
final_dataset[ , .( mean.change = mean( "Cleanliness" ) ),
                 by = Class
              ][ , Class[ which.max( mean.change ) ] ] 

The error message reads:

Error in [.data.table (final_dataset, , .(mean.change = mean("Cleanliness")), : fastmean was passed type character, not numeric or logical>

I read something about providing sample data in other posts while looking for solutions but have no clue if this is how to do it. I tried to insert a little bit as a sample. Just for reference: this is where I gut the dataset.

ID      Class           Check-in Service   Online Boarding     Gate Location   Cleanliness
<chr>   <dbl>           <dbl>
1       Business        3                  3                   4               3    
2       Economy Plus    2                  2                   3               5
3       Economy         2                  2                   3               2    
4       Business        4                  4                   4               5
5       Economy         1                  1                   3               2

I hope that is all you need to understand my question, I'm fairly new to this.

Thanks in advance for your help!

I don't exactly sure what you want but here is my attempt with data.table package. Tidyverse is essential for the R by the way. I don't understand what you meant by "doesn't really look great":)

df<-tibble(Class=c("Business","Economy Plus","Economy","Business"),service1=c(1,2,3,4),service2=c(1,2,3,4),service3=c(1,2,3,4),service4=c(1,2,3,4))


df$Class <- as.factor(df$Class)

dummy data:

    # A tibble: 4 x 5
  Class        service1 service2 service3 service4
  <chr>           <dbl>    <dbl>    <dbl>    <dbl>
1 Business            1        1        1        1
2 Economy Plus        2        2        2        2
3 Economy             3        3        3        3
4 Business            4        4        4        4

--

library(data.table)

df<-as.data.table(df)

df<-df[,.(satisfaction=mean(c(service1,service2,service3,service4))),by=Class]

output:

          Class satisfaction
1:     Business          2.5
2: Economy Plus          2.0
3:      Economy          3.0

Hope this helps you.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM