I'm diving in the world of data.tables and so far enjoy the syntax, as I find I can do a lot more with writing a lot less. It is a bit exotic at times however.
Here's one thing I need to figure out--I know how to do joins, such as x[y], but what I need to do is a bit more complex (but still pretty simple!).
Our sales database suffers from many iterations of the same Rep's name, I keep a separate list that tells me when two names are actually the same rep. In for the $$'s it might have one or two versions of a particular rep's name (usually it's the first one, but sometimes someone's name may have been misspelled for for first few months of the year then corrected).
I'll provide two sample data.table's that I want to combine, I don't know HOW to get the result I want but I will also write out what I want to occur.
DT1 <- data.table(name=c("Bob Smith", "Robert Smith", "Mary Stone", "Maryanne Stone", "Jason Hasberg"),
sales=c(12, 15, 23, 10, 11))
DT2 <- data.table(correctname=c("Bob Smith", "Maryanne Stone", "Jason Hasberg"),
namechoice1=c("Robert Smith", "Mary Stone", "Jason Hasberg"),
namechoice2=c("Bob Smith", "Maryanne Stone", NA))
DT1
name sales
1: Bob Smith 12
2: Robert Smith 15
3: Mary Stone 23
4: Maryanne Stone 10
5: Jason Hasberg 11
DT2
correctname namechoice1 namechoice2
1: Bob Smith Robert Smith Bob Smith
2: Maryanne Stone Mary Stone Maryanne Stone
3: Jason Hasberg Jason Hasberg NA
So in ENGLISH: If name in DT1 is either namechoice1, or namechoice2, then use correctname on that line item, then sum the sales for the various names under that name.
(watch out, I threw in a NA for Jason as very often the name doesn't need correcting)
Expected result:
correctname sales
1: Bob Smith 27
2: Maryanne Stone 33
3: Jason Hasberg 11
I'm hoping for an answer that is as few lines as possible, but perhaps there needs to be some further subsetting before the final sum can be calculated..
Looking forward to your answers, THANK YOU!!
You need to melt your name map table into long format so you'll have one row per alias, with each row also containing correct name. Then you can just join on the alias and aggregate on the true name:
DT2.new <- melt(DT2, id.vars="correctname")[!is.na(value), list(correctname, value)]
setkey(DT2.new, value)
DT2.new[DT1][, sum(sales), by=correctname]
Produces:
correctname V1
1: Bob Smith 27
2: Maryanne Stone 33
3: Jason Hasberg 11
Note that the correct way of storing your aliases is in the format of DT2.new
. Among other things, this allows you to have a different number of aliases for each person instead of needing to have as many columns as your employee with most aliases has aliases.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.