简体   繁体   中英

Subsetting datasets with unequal number of observations in R

I have a dataset of movies in R with over 5,000 observations and another dataset of movies with the books they're based on that has just over 1,600 observations. I want to combine the datasets and subset it to only the movies that were based on books.

Here are a couple sample datalines from the movies dataset:

movie_title      duration    gross     content_rating    year
Avatar           178       760505847     PG-13           2009
The Jungle Book  106       362645141     PG              2016

And a couple from the books dataset:

movie_title                         book        author          released
Hunger Games: Mockingjay, Part 2    Mockingjay  Suzanne Collins 2015
Insurgent                           Insurgent   Veronica Roth   2015

I only care about the movie_titles that they have in common. I tried to merge the two datasets by movie title and it says there are 0 observations.

movies<-merge(imdb.movies,booklist, by="movie_title")

I've also tried filtering it with this code:

filter(imdb.movies, imdb.movies$movie_title==booklist$movie_title)

And combining them with this code:

    combined_movies<- imdb.movies[imdb.movies$movie_title==booklist$movie_title]

None of these seem to work. Is there a way to compare if imdb.movies$movie_title==booklist$movie_title and only keep the observations that are the same?

Using dplyr you could use:

inner_join(imdb.movies,booklist)

Assumes that the movie_title column name in both data frames are identical and that the move name / book names are identical.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM