Subsetting datasets with unequal number of observations in R

Question

I have a dataset of movies in R with over 5,000 observations and another dataset of movies with the books they're based on that has just over 1,600 observations. I want to combine the datasets and subset it to only the movies that were based on books.

Here are a couple sample datalines from the movies dataset:

movie_title      duration    gross     content_rating    year
Avatar           178       760505847     PG-13           2009
The Jungle Book  106       362645141     PG              2016

And a couple from the books dataset:

movie_title                         book        author          released
Hunger Games: Mockingjay, Part 2    Mockingjay  Suzanne Collins 2015
Insurgent                           Insurgent   Veronica Roth   2015

I only care about the movie_titles that they have in common. I tried to merge the two datasets by movie title and it says there are 0 observations.

movies<-merge(imdb.movies,booklist, by="movie_title")

I've also tried filtering it with this code:

filter(imdb.movies, imdb.movies$movie_title==booklist$movie_title)

And combining them with this code:

    combined_movies<- imdb.movies[imdb.movies$movie_title==booklist$movie_title]

None of these seem to work. Is there a way to compare if imdb.movies$movie_title==booklist$movie_title and only keep the observations that are the same?

Answer 1

Using dplyr you could use:

inner_join(imdb.movies,booklist)

Assumes that the movie_title column name in both data frames are identical and that the move name / book names are identical.

Subsetting datasets with unequal number of observations in R

Question

1 answers

solution1
0 2016-12-01 21:25:17

Subsetting datasets with unequal number of observations in R

Question

1 answers

solution1 0 2016-12-01 21:25:17

solution1
0 2016-12-01 21:25:17