I would like to join two tables with no headers and the only commonality is the first column that always has the IDs in R. The tables do not have the same number of columns or rows.
I want to join this table with no header
+-------+--------+--------+
| 80938 | James | Nov-00 |
+-------+--------+--------+
| 78397 | Tom | Jul-20 |
+-------+--------+--------+
| 73820 | Pan | Sep-10 |
+-------+--------+--------+
| 64920 | Kim | Nov-01 |
+-------+--------+--------+
| 83915 | Amanda | Jan-03 |
+-------+--------+--------+
| 83649 | Linda | Jul-07 |
+-------+--------+--------+
and this table with no header
+-------+---+--------+--------+--------+--------+
| 80938 | 1 | 500000 | 600000 | 700000 | 800000 |
+-------+---+--------+--------+--------+--------+
| 80938 | 2 | 333 | 456 | 567 | 467 |
+-------+---+--------+--------+--------+--------+
| 80938 | 3 | 444 | 456 | 399 | 799 |
+-------+---+--------+--------+--------+--------+
| 80938 | 4 | 20000 | 4000 | 3222 | 3456 |
+-------+---+--------+--------+--------+--------+
| 80938 | 5 | 21305 | 23456 | 3567 | 8533 |
+-------+---+--------+--------+--------+--------+
| 80938 | 6 | 345067 | 2455 | 23356 | 244567 |
+-------+---+--------+--------+--------+--------+
to the final combined table below.
+-------+--------+--------+---+--------+--------+--------+--------+
| 80938 | James | Nov-00 | 1 | 500000 | 600000 | 700000 | 800000 |
+-------+--------+--------+---+--------+--------+--------+--------+
| 80938 | James | Dec-00 | 2 | 333 | 456 | 567 | 467 |
+-------+--------+--------+---+--------+--------+--------+--------+
| 80938 | James | Jan-01 | 3 | 444 | 456 | 399 | 799 |
+-------+--------+--------+---+--------+--------+--------+--------+
| 80938 | James | Feb-01 | 4 | 20000 | 4000 | 3222 | 3456 |
+-------+--------+--------+---+--------+--------+--------+--------+
| 80938 | James | Mar-01 | 5 | 21305 | 23456 | 3567 | 8533 |
+-------+--------+--------+---+--------+--------+--------+--------+
| 80938 | James | Apr-01 | 6 | 345067 | 2455 | 23356 | 244567 |
+-------+--------+--------+---+--------+--------+--------+--------+
| 78397 | Tom | 20-Jul | 1 | 4728 | 82920 | 39 | 323992 |
+-------+--------+--------+---+--------+--------+--------+--------+
| 78397 | Tom | 21-Jul | 2 | 38120 | 3820 | 38292 | 2920 |
+-------+--------+--------+---+--------+--------+--------+--------+
| 78397 | Tom | 22-Jul | 3 | 39302 | 238202 | 23920 | 2822 |
+-------+--------+--------+---+--------+--------+--------+--------+
| 78397 | Tom | 23-Jul | 4 | 3920 | 28202 | 293 | 83920 |
+-------+--------+--------+---+--------+--------+--------+--------+
| 78397 | Tom | 24-Jul | 5 | 3830 | 820230 | 9292 | 2929 |
+-------+--------+--------+---+--------+--------+--------+--------+
| 78397 | Tom | 25-Jul | 6 | 12380 | 29202 | 2929 | 8292 |
+-------+--------+--------+---+--------+--------+--------+--------+
| 73820 | Pan | 10-Sep | | | | | |
+-------+--------+--------+---+--------+--------+--------+--------+
| 64920 | Kim | 1-Nov | | | | | |
+-------+--------+--------+---+--------+--------+--------+--------+
| 83915 | Amanda | 3-Jan | | | | | |
+-------+--------+--------+---+--------+--------+--------+--------+
| 83649 | Linda | 7-Jul | | | | | |
+-------+--------+--------+---+--------+--------+--------+--------+
I tried to use full_join and merge but I constantly get an error message (I read.csv the files then did a data.frame application so as to use the position V1 to join by and that did not work).
The example you give in your question cannot produce the expected output, since you only have rows that match James' ID, but you don't have repeats of Tom's ID. I'm therefore going to assume that your second table is incomplete relative to the expected output, and that your input data is like this:
csv1 <- structure(list(V1 = c(80938L, 78397L, 73820L, 64920L, 83915L,
83649L), V2 = c("James", "Tom", "Pan", "Kim", "Amanda", "Linda"
), V3 = c("Nov-00", "Jul-20", "Sep-10", "Nov-01", "Jan-03", "Jul-07"
)), class = "data.frame", row.names = c(NA, -6L))
csv1
#> V1 V2 V3
#> 1 80938 James Nov-00
#> 2 78397 Tom Jul-20
#> 3 73820 Pan Sep-10
#> 4 64920 Kim Nov-01
#> 5 83915 Amanda Jan-03
#> 6 83649 Linda Jul-07
and
csv2 <- structure(list(V1 = c(80938L, 80938L, 80938L, 80938L, 80938L,
80938L, 78397L, 78397L, 78397L, 78397L, 78397L, 78397L), V2 = c(1L,
2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L), V3 = c(500000L,
333L, 444L, 20000L, 21305L, 345067L, 4728L, 38120L, 39302L, 3920L,
3830L, 12380L), V4 = c(600000L, 456L, 456L, 4000L, 23456L, 2455L,
82920L, 3820L, 238202L, 28202L, 820230L, 29202L), V5 = c(700000L,
567L, 399L, 3222L, 3567L, 23356L, 39L, 38292L, 23920L, 293L,
9292L, 2929L), V6 = c(800000L, 467L, 799L, 3456L, 8533L, 244567L,
323992L, 2920L, 2822L, 83920L, 2929L, 8292L)), row.names = c(NA,
-12L), class = "data.frame")
csv2
#> V1 V2 V3 V4 V5 V6
#> 1 80938 1 500000 600000 700000 800000
#> 2 80938 2 333 456 567 467
#> 3 80938 3 444 456 399 799
#> 4 80938 4 20000 4000 3222 3456
#> 5 80938 5 21305 23456 3567 8533
#> 6 80938 6 345067 2455 23356 244567
#> 7 78397 1 4728 82920 39 323992
#> 8 78397 2 38120 3820 38292 2920
#> 9 78397 3 39302 238202 23920 2822
#> 10 78397 4 3920 28202 293 83920
#> 11 78397 5 3830 820230 9292 2929
#> 12 78397 6 12380 29202 2929 8292
Creating the join is very straightforward: you want to left join csv2 onto csv1 like this:
library(dplyr)
csv1 %>%
left_join(csv2, by = "V1")
#> V1 V2.x V3.x V2.y V3.y V4 V5 V6
#> 1 80938 James Nov-00 1 500000 600000 700000 800000
#> 2 80938 James Nov-00 2 333 456 567 467
#> 3 80938 James Nov-00 3 444 456 399 799
#> 4 80938 James Nov-00 4 20000 4000 3222 3456
#> 5 80938 James Nov-00 5 21305 23456 3567 8533
#> 6 80938 James Nov-00 6 345067 2455 23356 244567
#> 7 78397 Tom Jul-20 1 4728 82920 39 323992
#> 8 78397 Tom Jul-20 2 38120 3820 38292 2920
#> 9 78397 Tom Jul-20 3 39302 238202 23920 2822
#> 10 78397 Tom Jul-20 4 3920 28202 293 83920
#> 11 78397 Tom Jul-20 5 3830 820230 9292 2929
#> 12 78397 Tom Jul-20 6 12380 29202 2929 8292
#> 13 73820 Pan Sep-10 NA NA NA NA NA
#> 14 64920 Kim Nov-01 NA NA NA NA NA
#> 15 83915 Amanda Jan-03 NA NA NA NA NA
#> 16 83649 Linda Jul-07 NA NA NA NA NA
However, it seems you would rather have blank cells than NA
, in which case you need to convert the numeric columns to characters and replace the NA
values with empty strings:
csv1 %>%
left_join(csv2, by = "V1") %>%
mutate_all(function(x) replace(x, is.na(as.character(x)), ""))
#> V1 V2.x V3.x V2.y V3.y V4 V5 V6
#> 1 80938 James Nov-00 1 500000 600000 700000 800000
#> 2 80938 James Nov-00 2 333 456 567 467
#> 3 80938 James Nov-00 3 444 456 399 799
#> 4 80938 James Nov-00 4 20000 4000 3222 3456
#> 5 80938 James Nov-00 5 21305 23456 3567 8533
#> 6 80938 James Nov-00 6 345067 2455 23356 244567
#> 7 78397 Tom Jul-20 1 4728 82920 39 323992
#> 8 78397 Tom Jul-20 2 38120 3820 38292 2920
#> 9 78397 Tom Jul-20 3 39302 238202 23920 2822
#> 10 78397 Tom Jul-20 4 3920 28202 293 83920
#> 11 78397 Tom Jul-20 5 3830 820230 9292 2929
#> 12 78397 Tom Jul-20 6 12380 29202 2929 8292
#> 13 73820 Pan Sep-10
#> 14 64920 Kim Nov-01
#> 15 83915 Amanda Jan-03
#> 16 83649 Linda Jul-07
Note also that in your expected output you show the dates incrementing for those with repeated entries. However, one appears to increment by months and one by days, with no indication of how this pattern was decided or to be achieved. I therefore have left these as they are pending your advice.
Created on 2020-08-02 by the reprex package (v0.3.0)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.