简体   繁体   中英

Using Merge in R to replicate Vlookup

This kind of helped: How to do vlookup in R

Problem: I have a list of machine numbers in the database and that need to have a machine rate associated with them (eg $20.00). In a CSV (machine_rates.csv) file, I have a list of those machine numbers with the associated machine rate (columns A & B, respectively).

I've tried using MERGE for this but for some reason it creates a lot of NA's throughout the dataframe even though I have the all.x = TRUE . It almost seems like if a machine # doesn't show up for that row, it turns the whole row into NA's. SO this leads me to believe I am not understanding the MERGE function correctly (read through many posts trying to find the equivalent of a vlookup in R).

So here below, I tried to create a new dataframe by the merge but when merging, how do you tell it to create a new column to place those merged machine rates?

dBase = dbReadTable(conn, "Mfng_Data")
mBase = read.csv("Machine_Rates.csv")

dBase2 = merge(dBase, mBase, by.x = "machine_number", by.y = "machine_number",
               all.x = TRUE)

Edit:
Is there a way to get around listing all of the items out? dBase contains about a million records (around 1m rows x 70c matrix). So if there are 150 different machine rates, would I have to list all of those out or is it possible to "index" those values in the CSV by matching the machine number in mBase to the machine number in dBase ?

A dplyr solution.

library(dplyr)
dbase <- data.frame(machine_number = c("10","20","30","10","10","50"),
                second_attribute=c("a","b","c","c","a","d"))
mbase <- data.frame(machine_number = c("10","20","30","40","50","60","70","80","90","100"),
                    rate=c(22,22,25,17,15,15,55,12,15,19))

left_join(dbase, mbase, by = "machine_number") 

  machine_number second_attribute rate
1             10                a   22
2             20                b   22
3             30                c   25
4             10                c   22
5             10                a   22
6             50                d   15

If you're trying to do the equivalent of an inner join, try removing the all.x argument. It seems like you're looking for a left join, which is what you've already tried. Check your Mfing_data. They might be the source of those mysterious NA's. Also, if the merge column has the same name in each data frame you can leave out the by argument.

dbase<-data.frame(machine_number=c("10","20","10","30","25"),stringsAsFactors = F)

mbase<-data.frame(machine_number=c("10","20","30","40"),machine_rate=c(32,65,12,22), stringsAsFactors = F)

merge(dbase,mbase,all.x = T)


  machine_number machine_rate
1             10           32
2             10           32
3             20           65
4             25           NA
5             30           12

Another option for when you only have a limited number of items you want to match.

dbase <- data.frame(machine_number = c("10","20","30","10","10","50"),
                second_attribute=c("a","b","c","c","a","d"))

Notice that for this method, the machine number is no longer defined as a number.

You can define a small lookup vector as follows:

lookup <- c("10"=22, "20"=22, "30"=25, "50"=15)

Then you can directly add in the values to your first data frame with the following:

dbase$rate <- sapply(dbase[,1], function(x) unname(lookup[x]))
dbase
  machine_number second_attribute rate
1             10                a   22
2             20                b   22
3             30                c   25
4             10                c   22
5             10                a   22
6             50                d   15

The sapply takes the first column of dbase, and conducts the lookup to the lookup object we defined.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM