Adding values from one dataframe to another based on two matching conditions in R

Question

I have two dataframes below:

dput output df1:

structure(list(Location = c("1100 2ND AVENUE", "1100 2ND AVENUE", 
"1100 2ND AVENUE", "1100 2ND AVENUE", "1100 2ND AVENUE", "1100 2ND AVENUE"
), `Ivend Name` = c("3 Mskt 1.92oz", "Almond Joy 1.61oz", "Aquafina 20oz", 
"BCanyonChptleAdzuk1.5oz", "BlkForest FrtSnk 2.25oz", "BluDimndSmkhseAlmd1.5oz"
), `Category Name` = c("Candy", "Candy", "Water", "Salty Snacks", 
"Candy", "Nuts/Trailmix"), Calories = c(240, 220, 0, 215, 193, 
260), Sugars = c("36", "20", "0", "2", "32", "2"), Month = structure(c(4L, 
4L, 4L, 4L, 4L, 4L), .Label = c("Oct", "Nov", "Dec", "Jan", "Feb", 
"Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep"), class = "factor"), 
    Products_available_per_machine = c(0, 0, 0, 0, 0, 0), Units_sold = c(0, 
    0, 0, 0, 0, 0), Total_Sales = c(0, 0, 0, 0, 0, 0), Spoils = c(0, 
    0, 0, 0, 0, 0), Building = c("1100 2ND", "1100 2ND", "1100 2ND", 
    "1100 2ND", "1100 2ND", "1100 2ND"), Item = structure(c(2L, 
    2L, 1L, 2L, 2L, 2L), .Label = c("Beverage", "Food"), class = "factor"), 
    Year = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "2019", class = "factor")), row.names = c(NA, 
-6L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x00000233561b1ef0>)

dput output df2:

structure(list(`Date Ran` = structure(c(1548892800, 1551312000, 
1553817600, 1556582400, 1561680000, 1564531200), class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), Year = c(2019, 2019, 2019, 2019, 2019, 
2019), Month = c("January", "February", "March", "April", "June", 
"July"), Location = c("SEA18", "SEA18", "SEA18", "SEA18", "SEA18", 
"SEA18"), Building = c("Alexandria", "Alexandria", "Alexandria", 
"Alexandria", "Alexandria", "Alexandria"), Population = c(1177, 
1179, 1178, 1156, 1163, 1163)), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))

I want to pull the pop col from DF 2 and add it to Dataframe 1 based on 'Building' and 'Month' in the order population is filled in DF2.

I tried this command using merge but the col is NULL when I execute:

df_2019_final1$Population <- df_2019_pop$Population[match(df_2019_final1$Month, df_2019_pop$Month, df_2019_final1$Building, df_2019_pop$Building)]

subset_df_pop <- df_2019_pop[, c("Month", "Building", "Population")]


updated_2019_test <- merge(df_2019_final1, subset_df_pop, by = c('Month', 'Building'))

Both produce NULLS and a blank DF

Any help would be greatly appreciated.

Answer 1

In one of the datasets, the 'Month' is abbreviated and in the second it is full name. We can adjust to either one of those formats and the merge would work

df2$MonthN <- month.abb[match(df2$Month, month.name)]
library(dplyr)
left_join(df1, df2[, c("MonthN", "Building", "Population")], 
             by = c('Month' = 'MonthN', 'Building'))

Or with merge

merge(df1, df2[, c("MonthN", "Building", "Population")], 
   by.x = c('Month', 'Building'), by.y = c('MonthN', 'Building'), all.x = TRUE)

Note: The Population column on the merged dataset will be NA based on the example as "Building" values are different in the subset datasets

Adding values from one dataframe to another based on two matching conditions in R

Question

1 answers

solution1
1 ACCPTED 2020-01-31 20:36:04

Adding values from one dataframe to another based on two matching conditions in R

Question

1 answers

solution1 1 ACCPTED 2020-01-31 20:36:04

solution1
1 ACCPTED 2020-01-31 20:36:04