简体   繁体   中英

Operating on R Data Frames

So I made a data frame of people's names, ages, and their favorite movies. I want to write a program that acts on the data frame to give me the average age of each person with a specific favorite movie. Here's what I have.

 persons <- list(firstName = c("Steve","Bob","Bill","Chris","Matt","Evan"), lastName = c("Williams","Barker","Barker","Williams","Stevenson","Parker"), age = c(22,30,41,14,9,93), favoriteMovie = c("Alien","The Shining","The Shining","Halloween","Alien","Alien"))
 d1 <- data.frame(persons$firstName,persons$lastName,persons$age,persons$favoriteMovie)

 d1
  persons.firstName persons.lastName persons.age persons.favoriteMovie
1             Steve         Williams          22                 Alien
2               Bob           Barker          30           The Shining
3              Bill           Barker          41           The Shining
4             Chris         Williams          14             Halloween
5              Matt        Stevenson           9                 Alien
6              Evan           Parker          93                 Alien

So I can do it with a loop of if statements but I don't think this is the most efficient way to do this. I'm sure there's some sort of way to kind of single out values but I'm really not sure.

You could try using tapply

> with(d1, tapply(persons.age, persons.favoriteMovie, mean))
      Alien   Halloween The Shining 
   41.33333    14.00000    35.50000 

You migth want to take a look at this answer

You can use by() for this:

by(d1$persons.age, d1$persons.favoriteMovie, mean)
d1$persons.favoriteMovie: Alien
[1] 41.33333
------------------------------------------------------------------------------------------------------------- 
d1$persons.favoriteMovie: Halloween
[1] 14
------------------------------------------------------------------------------------------------------------- 
d1$persons.favoriteMovie: The Shining
[1] 35.5

The package doBy with the function summaryBy can help you.

library(doBy)
summaryBy(persons.age~persons.favoriteMovie, data=d1, FUN=c(mean))
#persons.favoriteMovie persons.age.mean
#1                 Alien         41.33333
#2             Halloween         14.00000
#3           The Shining         35.50000

Or you could use dplyr .

library(dplyr)
grouped <- group_by(d1, persons.favoriteMovie)
summarise(grouped, mean=mean(persons.age))
#  persons.favoriteMovie     mean
#                 (fctr)    (dbl)
#1                 Alien 41.33333
#2             Halloween 14.00000
#3           The Shining 35.50000

We can use data.table

library(data.table)
setDT(d1)[,.(persons.age = mean(persons.age)) , persons.favoriteMovie]
#   persons.favoriteMovie persons.age
#1:                 Alien    41.33333
#2:           The Shining    35.50000
#3:             Halloween    14.00000

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM