I have two tables I need to join. These tables only share 1 field in common (ID, and it isn't unique). Is it possible to join these two tables but make it unique and keep all matching data in a row?
For example, I have two tables as follows:
+-------+----------+
| ID | NAME |
+-------+----------+
| A | Jack |
| A | Andy |
| A | Steve |
| A | Jay |
| B | Chris |
| B | Vicky |
| B | Emma |
+-------+----------+
And another table that is ONLY related by the ID column:
+-------+--------+
| ID | Age |
+-------+--------+
| A | 22 |
| A | 31 |
| A | 11 |
| B | 40 |
| B | 17 |
| B | 20 |
| B | 3 |
| B | 65 |
+-------+--------+
The end result I'd like to get is:
+-------+----------+++-------+
| ID | NAME | Age |
+-------+----------++-------+-
| A | Jack | 22 |
| A | Andy | 31 |
| A | Steve | 11 |
| A | Jay | null |
| B | Chris | 40 |
| B | Vicky | 17 |
| B | Emma | 20 |
| B | null | 3 |
| B | null | 65 |
+-------+----------+++-------+
This is the default behavior of the data step merge, except that it won't set the last row's variable to missing - but it's easy to fudge.
There are other ways to do this, the best in my opinion being the hash object if you're comfortable with that.
data names;
infile datalines dlm='|';
input ID $ NAME $;
datalines;
| A | Jack |
| A | Andy |
| A | Steve |
| A | Jay |
| B | Chris |
| B | Vicky |
| B | Emma |
;;;;
run;
data ages;
infile datalines dlm='|';
input id $ age;
datalines;
| A | 22 |
| A | 31 |
| A | 11 |
| B | 40 |
| B | 17 |
| B | 20 |
| B | 3 |
| B | 65 |
;;;;
run;
data want;
merge names(in=_a) ages(in=_b);
by id;
if _a;
if name ne lag(name) then output; *this assumes `name` is unique in id - if it is not we may have to do a bit more work here;
call missing(age); *clear age after output so we do not attempt to fill extra rows with the same age - age will be 'retain'ed;
run;
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.