Operating on R Data Frames

Question

So I made a data frame of people's names, ages, and their favorite movies. I want to write a program that acts on the data frame to give me the average age of each person with a specific favorite movie. Here's what I have.

 persons <- list(firstName = c("Steve","Bob","Bill","Chris","Matt","Evan"), lastName = c("Williams","Barker","Barker","Williams","Stevenson","Parker"), age = c(22,30,41,14,9,93), favoriteMovie = c("Alien","The Shining","The Shining","Halloween","Alien","Alien"))
 d1 <- data.frame(persons$firstName,persons$lastName,persons$age,persons$favoriteMovie)

 d1
  persons.firstName persons.lastName persons.age persons.favoriteMovie
1             Steve         Williams          22                 Alien
2               Bob           Barker          30           The Shining
3              Bill           Barker          41           The Shining
4             Chris         Williams          14             Halloween
5              Matt        Stevenson           9                 Alien
6              Evan           Parker          93                 Alien

So I can do it with a loop of if statements but I don't think this is the most efficient way to do this. I'm sure there's some sort of way to kind of single out values but I'm really not sure.

Answer 1

You could try using tapply

> with(d1, tapply(persons.age, persons.favoriteMovie, mean))
      Alien   Halloween The Shining 
   41.33333    14.00000    35.50000

You migth want to take a look at this answer

Answer 2

You can use by() for this:

by(d1$persons.age, d1$persons.favoriteMovie, mean)
d1$persons.favoriteMovie: Alien
[1] 41.33333
------------------------------------------------------------------------------------------------------------- 
d1$persons.favoriteMovie: Halloween
[1] 14
------------------------------------------------------------------------------------------------------------- 
d1$persons.favoriteMovie: The Shining
[1] 35.5

Answer 3

The package doBy with the function summaryBy can help you.

library(doBy)
summaryBy(persons.age~persons.favoriteMovie, data=d1, FUN=c(mean))
#persons.favoriteMovie persons.age.mean
#1                 Alien         41.33333
#2             Halloween         14.00000
#3           The Shining         35.50000

Or you could use dplyr .

library(dplyr)
grouped <- group_by(d1, persons.favoriteMovie)
summarise(grouped, mean=mean(persons.age))
#  persons.favoriteMovie     mean
#                 (fctr)    (dbl)
#1                 Alien 41.33333
#2             Halloween 14.00000
#3           The Shining 35.50000

Answer 4

We can use data.table

library(data.table)
setDT(d1)[,.(persons.age = mean(persons.age)) , persons.favoriteMovie]
#   persons.favoriteMovie persons.age
#1:                 Alien    41.33333
#2:           The Shining    35.50000
#3:             Halloween    14.00000

Operating on R Data Frames

Question

4 answers

solution1
3 2016-06-16 22:57:05

solution2
2 2016-06-16 22:55:02

solution3
1 2016-06-17 01:31:08

solution4
1 2016-06-17 02:46:55

Operating on R Data Frames

Question

4 answers

solution1 3 2016-06-16 22:57:05

solution2 2 2016-06-16 22:55:02

solution3 1 2016-06-17 01:31:08

solution4 1 2016-06-17 02:46:55

solution1
3 2016-06-16 22:57:05

solution2
2 2016-06-16 22:55:02

solution3
1 2016-06-17 01:31:08

solution4
1 2016-06-17 02:46:55