简体   繁体   中英

Assigning values in dataframe based on matching columns between 2 different dataframe (R)

I am new to R programming and was wondering if I could get some help on a problem I am stuck on. Currently I have two data frame. One is ign_temp, which has a list of video games titles (about 18,000) and corresponding platform they appear on (about 30+ types). There are some title entries that appear multiple times due to being released on multiple platform as shown below. This df was just filtered to only show title and platform from the original database which has numerous columns (id, url, release year, etc).

ign_temp:

            title                           platform
            LittleBigPlanet                 Playstation Vita           
            Splice                          Playstation Vita     
            NHL 13                          Xbox 
            NHL 13                          Android
            Wild                            iPhone
            Mark of the Ninja               Xbox 360
            Mark of the Ninja               PC
            .......

I have another data frame ign_revised which has sample set of the games from the dataframe above but have additional column values like score, year, etc. Each game only appears distinctly once per row and I have added new columns to the dataframe for the possible platform they appear on (after year column starting from Android going to Xbox One, around 24 platform) as shown below (condensed view):

ign_revised:

       id     score_phrase   title     score   genre    year   Android Arcade ...  Xbox One
       315    Cool           Abzu      7.5     Puzzle   2012   Android Arcade ...  Xbox One
       87     Poor           Alan      5.0     Action   2014   Android Arcade ...  Xbox One
       .....
       598    Great          NHL 13    8.5     Sports   2013   Android Arcade ...  Xbox One

Ign_revised is alphabetically ordered and the game platform columns (Android Arcade .. XboxOne) just have the platform name repeated for all 1600+ title that appear in this dataframe.

My main question is is there a way like a for loop such that from ign_revised, use title and the platform columns (Android Arcade ... XboxOne) to match ign_temp with the corresponding title and platform, and change the values in ign_revised's columns of Android Arcade ... XboxOne to instead show 1 (videogame title appears in that platform) or 0 if it doesn't. So it would look like something like below:

ign_revised (Final Result):

       id     score_phrase   title     score   genre    year   Android Arcade ...  Xbox One
       315    Cool           Abzu      7.5     Puzzle   2012   0       1      ...  0
       87     Poor           Alan      5.0     Action   2014   0       0      ...  1
       .....
       598    Great          NHL 13    8.5     Sports   2013   1       0      ...  1

In my actual ign_revised dataframe, title is in the 3rd column and platform names starting with Android is 12th column, if that helps.

Pseudocode:

   for (i in 1:nrow(ign_revised)) {

      for (j in 12:ncol(ign_revised)) {

           * Match current title and platform to ign_temp
           * Assign current cell (i,j) value with 1 or 0 based on match

          }
    }

Thank you!

@Gregor

EDIT 1: Sorry, I can't seem to comment and correctly space out the modified code in a reply comment, but since ign_temp will need the entire 18,625 games not just the 7 listed games I had above from the original df called (ign), should I modify it to something like this? :

all_title <- ign$title 

all_platform <- ign$platform

ign_temp <- structure(list(title = all_title, platform = all_platform, .Names = c("title","platform"), row.names = c(1, -18625L), class = c("data.frame")))

ign_temp$value = 1
ign_temp_wide = reshape2::dcast(title ~ platform, data = ign_temp,value.var = "value", fill = 0)

merge(ign_revised[1:11], ign_temp_wide)

I'm not sure because I get the error:

Error in rep(1, nrow(data)) : invalid 'times' argument

EDIT 2: Adding the dput for ign_revised, ign_temp, ign_temp_wide.

 dput(droplevels(head(ign_temp, 7)))
 structure(list(title = c("LittleBigPlanet PS Vita", "LittleBigPlanet PS Vita -- Marvel Super Hero Edition", 
 "Splice: Tree of Life", "NHL 13", "NHL 13", "Total War Battles: Shogun", 
 "Double Dragon: Neon"), platform = c("PlayStation Vita", "PlayStation Vita", 
 "iPad", "Xbox 360", "PlayStation 3", "Macintosh", "Xbox 360"), 
value = c(1, 1, 1, 1, 1, 1, 1)), .Names = c("title", "platform", 
  "value"), row.names = c(NA, -7L), class = c("tbl_df", "tbl", 
  "data.frame"))



dput(droplevels(head(ign_temp_wide, 7)))

structure(list(title = c("#IDARB", "007 Legends", "1001 Spikes", 
"140", "1979 Revolution", "2014 FIFA World Cup Brazil", "3 Heroes -- Crystal Soul"
), Android = c(0, 0, 0, 0, 0, 0, 0), Arcade = c(0, 0, 0, 0, 0, 
0, 0), iPad = c(0, 0, 0, 0, 0, 0, 0), iPhone = c(0, 0, 0, 0, 
0, 0, 0), Linux = c(0, 0, 0, 0, 0, 0, 0), Macintosh = c(0, 0, 
0, 0, 0, 0, 0), `New Nintendo 3DS` = c(0, 0, 0, 0, 0, 0, 0), 
    `Nintendo 3DS` = c(0, 0, 1, 0, 0, 0, 0), `Nintendo DS` = c(0, 
    0, 0, 0, 0, 0, 0), `Nintendo DSi` = c(0, 0, 0, 0, 0, 0, 1
    ), Ouya = c(0, 0, 0, 0, 0, 0, 0), PC = c(0, 0, 1, 1, 1, 0, 
    0), `PlayStation 3` = c(0, 1, 0, 0, 0, 1, 0), `PlayStation 4` = c(0, 
    0, 1, 0, 0, 0, 0), `PlayStation Portable` = c(0, 0, 0, 0, 
    0, 0, 0), `PlayStation Vita` = c(0, 0, 1, 0, 0, 0, 0), SteamOS = c(0, 
    0, 0, 0, 0, 0, 0), `Web Games` = c(0, 0, 0, 0, 0, 0, 0), 
    Wii = c(0, 0, 0, 0, 0, 0, 0), `Wii U` = c(0, 1, 1, 0, 0, 
    0, 0), `Windows Phone` = c(0, 0, 0, 0, 0, 0, 0), `Windows Surface` = c(0, 
    0, 0, 0, 0, 0, 0), `Xbox 360` = c(0, 1, 0, 0, 0, 1, 0), `Xbox One` = c(1, 
    0, 0, 0, 0, 0, 0)), .Names = c("title", "Android", "Arcade", 
"iPad", "iPhone", "Linux", "Macintosh", "New Nintendo 3DS", "Nintendo 3DS", 
"Nintendo DS", "Nintendo DSi", "Ouya", "PC", "PlayStation 3", 
"PlayStation 4", "PlayStation Portable", "PlayStation Vita", 
"SteamOS", "Web Games", "Wii", "Wii U", "Windows Phone", "Windows Surface", 
"Xbox 360", "Xbox One"), row.names = c(NA, 7L), class = "data.frame")

dput(droplevels(head(ign_revised, 7)))
structure(list(X1 = c(18007L, 145L, 17730L, 17325L, 18475L, 17699L, 
16486L), score_phrase = c("Good", "Bad", "Great", "Great", "Great", 
"Good", "Mediocre"), title = c("#IDARB", "007 Legends", "1001 Spikes", 
"140", "1979 Revolution", "2014 FIFA World Cup Brazil", "3 Heroes -- Crystal Soul"
), url = c("/games/it-draws-a-red-box/xbox-one-20014945", "/games/007-legends/xbox-360-132394", 
"/games/1001-spikes/wii-u-132248", "/games/140-game/pc-20007190", 
"/games/1979-the-game/pc-115360", "/games/2014-fifa-world-cup/ps3-20012688", 
"/games/3-heroes-crystal-soul/dsi-126064"), platform = c("Xbox One", 
"Xbox 360", "Wii U", "PC", "PC", "PlayStation 3", "Nintendo DSi"
), score = c(7.5, 4.5, 8, 8, 8, 7.5, 5), genre = c("Party", "Action", 
"Platformer", "Platformer", "Action, Adventure", "Sports", "Adventure"
), editors_choice = c("N", "N", "N", "N", "N", "N", "N"), release_year = c(2015L, 
2012L, 2014L, 2013L, 2016L, 2014L, 2012L), release_month = c(1L, 
10L, 6L, 10L, 4L, 4L, 1L), release_day = c(14L, 16L, 8L, 16L, 
21L, 17L, 5L), Android = c("Android", "Android", "Android", "Android", 
"Android", "Android", "Android"), Arcade = c("Arcade", "Arcade", 
"Arcade", "Arcade", "Arcade", "Arcade", "Arcade"), iPad = c("iPad", 
"iPad", "iPad", "iPad", "iPad", "iPad", "iPad"), iPhone = c("iPhone", 
"iPhone", "iPhone", "iPhone", "iPhone", "iPhone", "iPhone"), 
    Linux = c("Linux", "Linux", "Linux", "Linux", "Linux", "Linux", 
    "Linux"), Macintosh = c("Macintosh", "Macintosh", "Macintosh", 
    "Macintosh", "Macintosh", "Macintosh", "Macintosh"), `New Nintendo 3DS` = c("New Nintendo 3DS", 
    "New Nintendo 3DS", "New Nintendo 3DS", "New Nintendo 3DS", 
    "New Nintendo 3DS", "New Nintendo 3DS", "New Nintendo 3DS"
    ), `Nintendo 3DS` = c("Nintendo 3DS", "Nintendo 3DS", "Nintendo 3DS", 
    "Nintendo 3DS", "Nintendo 3DS", "Nintendo 3DS", "Nintendo 3DS"
    ), `Nintendo DS` = c("Nintendo DS", "Nintendo DS", "Nintendo DS", 
    "Nintendo DS", "Nintendo DS", "Nintendo DS", "Nintendo DS"
    ), `Nintendo DSi` = c("Nintendo DSi", "Nintendo DSi", "Nintendo DSi", 
    "Nintendo DSi", "Nintendo DSi", "Nintendo DSi", "Nintendo DSi"
    ), Ouya = c("Ouya", "Ouya", "Ouya", "Ouya", "Ouya", "Ouya", 
    "Ouya"), PC = c("PC", "PC", "PC", "PC", "PC", "PC", "PC"), 
    `PlayStation 3` = c("PlayStation 3", "PlayStation 3", "PlayStation 3", 
    "PlayStation 3", "PlayStation 3", "PlayStation 3", "PlayStation 3"
    ), `PlayStation 4` = c("PlayStation 4", "PlayStation 4", 
    "PlayStation 4", "PlayStation 4", "PlayStation 4", "PlayStation 4", 
    "PlayStation 4"), `PlayStation Portable` = c("PlayStation Portable", 
    "PlayStation Portable", "PlayStation Portable", "PlayStation Portable", 
    "PlayStation Portable", "PlayStation Portable", "PlayStation Portable"
    ), `PlayStation Vita` = c("PlayStation Vita", "PlayStation Vita", 
    "PlayStation Vita", "PlayStation Vita", "PlayStation Vita", 
    "PlayStation Vita", "PlayStation Vita"), SteamOS = c("SteamOS", 
    "SteamOS", "SteamOS", "SteamOS", "SteamOS", "SteamOS", "SteamOS"
    ), `Web Games` = c("Web Games", "Web Games", "Web Games", 
    "Web Games", "Web Games", "Web Games", "Web Games"), Wii = c("Wii", 
    "Wii", "Wii", "Wii", "Wii", "Wii", "Wii"), `Wii U` = c("Wii U", 
    "Wii U", "Wii U", "Wii U", "Wii U", "Wii U", "Wii U"), `Windows Phone` = c("Windows Phone", 
    "Windows Phone", "Windows Phone", "Windows Phone", "Windows Phone", 
    "Windows Phone", "Windows Phone"), `Windows Surface` = c("Windows Surface", 
    "Windows Surface", "Windows Surface", "Windows Surface", 
    "Windows Surface", "Windows Surface", "Windows Surface"), 
    `Xbox 360` = c("Xbox 360", "Xbox 360", "Xbox 360", "Xbox 360", 
    "Xbox 360", "Xbox 360", "Xbox 360"), `Xbox One` = c("Xbox One", 
    "Xbox One", "Xbox One", "Xbox One", "Xbox One", "Xbox One", 
    "Xbox One")), .Names = c("X1", "score_phrase", "title", "url", 
"platform", "score", "genre", "editors_choice", "release_year", 
"release_month", "release_day", "Android", "Arcade", "iPad", 
"iPhone", "Linux", "Macintosh", "New Nintendo 3DS", "Nintendo 3DS", 
"Nintendo DS", "Nintendo DSi", "Ouya", "PC", "PlayStation 3", 
"PlayStation 4", "PlayStation Portable", "PlayStation Vita", 
"SteamOS", "Web Games", "Wii", "Wii U", "Windows Phone", "Windows Surface", 
"Xbox 360", "Xbox One"), row.names = c(NA, -7L), class = c("tbl_df", 
"tbl", "data.frame"))

I also checked the typeof for both title columns from both df as both are "character"

typeof(ign_temp$title)
[1] "character"
> typeof(ign_revised$title)
[1] "character"

@Gregor However, the merge still didn't seem to work. Since it was inner join, I also tried to specify by "title", but the platform columns still stay unchanged in ign_revised. Any suggestions?

merge(ign_revised[1:11], ign_temp_wide, by = "title")

I would first cast your ign_temp data frame into wide format, creating the dummy variables as you want, and then join to the ign_revised data.

Using this input:

ign_temp = structure(list(title = c("LittleBigPlanet", "Splice", "NHL 13", 
"NHL 13", "Wild", "Mark of the Ninja", "Mark of the Ninja"), 
    platform = c("Playstation Vita", "Playstation Vita", "Xbox", 
    "Android", "iPhone", "Xbox 360", "PC")), .Names = c("title", 
"platform"), row.names = c(NA, -7L), class = c("data.frame"))

ign_temp$value = 1
ign_temp_wide = reshape2::dcast(title ~ platform, data = ign_temp,
                           value.var = "value", fill = 0)
ign_temp_wide
#               title Android iPhone PC Playstation Vita Xbox Xbox 360
# 1   LittleBigPlanet       0      0  0                1    0        0
# 2 Mark of the Ninja       0      0  1                0    0        1
# 3            NHL 13       1      0  0                0    1        0
# 4            Splice       0      0  0                1    0        0
# 5              Wild       0      1  0                0    0        0

Then the join is simple. This should work:

merge(ign_revised[1:11], ign_temp_wide)

You just need an inner join here between the non-platform columns of ign_revised (I used 1:11 since you say the platforms start in the 12th column) and the entirety of ign_temp_wide . base::merge works, but you can pick your favorite method from How to join in R . If you have issues with the join, make sure that title is character class column in both data frames. I'm also assuming that the column name "title" is the same in both data frames.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM