简体   繁体   中英

How to join tables and generate sums of columns?

I have several tables( two in particular example) with the same structure. I would like to join on ID_Position & ID_Name and generate the sum of January and February in the output table (There might be some NAs in both columns)

ID_Position<-c(1,2,3,4,5,6,7,8,9,10)
Position<-c("A","B","C","D","E","H","I","J","X","W")
ID_Name<-c(11,12,13,14,15,16,17,18,19,20)
Name<-c("Michael","Tobi","Chris","Hans","Likas","Martin","Seba","Li","Sha","Susi")
  jan<-c(10,20,30,22,23,2,22,24,26,28)
  feb<-c(10,30,20,12,NA,3,NA,22,24,26)

df1 <- data.frame(ID_Position,Position,ID_Name,Name,jan,feb)


ID_Position<-c(1,2,3,4,5,6,7,8,9,10)
Position<-c("A","B","C","D","E","H","I","J","X","W")
ID_Name<-c(11,12,13,14,15,16,17,18,19,20)
 Name<-c("Michael","Tobi","Chris","Hans","Likas","Martin","Seba","Li","Sha","Susi")
  jan<-c(10,20,30,22,NA,NA,22,24,26,28)
  feb<-c(10,30,20,12,23,3,3,22,24,26)

  df2 <- data.frame(ID_Position,Position,ID_Name,Name,jan,feb)

I tried the inner and the full join. But that seems to work as I desire:

   library(plyr)

    test<-join(df1, df2, by =c("ID_Position","ID_Name") , type = "inner", match = "all")

Desired output:

  ID_Position   Position    ID_Name       Name         jan  feb
      1            A          11          Michael        20 20
      2            B          12          Tobi           40 60
      3            C          13          Chris          60 40
      4            D          14          Hans           44 24
      5            E          15          Likas          23 23
      6            H          16          Martin         2  6
      7            I          17          Seba           44 22
      8            J          18          Li             48 44
      9            X          19          Sha            52 48
     10            W          20          Susi           56 52

Your desired output doesn't seem entirely correct, but here's an example of how you can do this efficiently using data.table binary join which allows you to efficiently run functions while joining using the by = .EACHI option

library(data.table)
setkey(setDT(df1), ID_Position, ID_Name, Name) 
setkey(setDT(df2), ID_Position, ID_Name, Name)
df2[df1, .(jan = sum(jan, i.jan, na.rm = TRUE), 
           feb = sum(feb, i.feb, na.rm = TRUE)), 
    by = .EACHI]
#     ID_Position ID_Name    Name jan feb
#  1:           1      11 Michael  20  20
#  2:           2      12    Tobi  40  60
#  3:           3      13   Chris  60  40
#  4:           4      14    Hans  44  24
#  5:           5      15   Likas  46   0
#  6:           6      16  Martin   0   6
#  7:           7      17    Seba  44   0
#  8:           8      18      Li  48  44
#  9:           9      19     Sha  52  48
# 10:          10      20    Susi  56  52

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM