简体   繁体   中英

SAS SQL: Many to many relationships with 2 tables BUT don't want multiple rows

I have two tables I need to join. These tables only share 1 field in common (ID, and it isn't unique). Is it possible to join these two tables but make it unique and keep all matching data in a row?

For example, I have two tables as follows:

+-------+----------+
| ID    |   NAME   |
+-------+----------+
| A     | Jack     |
| A     | Andy     |
| A     | Steve    |
| A     | Jay      |
| B     | Chris    |
| B     | Vicky    |
| B     | Emma     |
+-------+----------+  

And another table that is ONLY related by the ID column:

+-------+--------+
| ID     | Age   |
+-------+--------+
| A     |     22 |
| A     |     31 |
| A     |     11 |
| B     |     40 |
| B     |     17 |
| B     |     20 |
| B     |      3 |
| B     |     65 |
+-------+--------+  

The end result I'd like to get is:

+-------+----------+++-------+
| ID    |   NAME   |  Age   |
+-------+----------++-------+-
| A     | Jack     |  22    |
| A     | Andy     |  31    |
| A     | Steve    |  11    |
| A     | Jay      |  null  |
| B     | Chris    |  40    |
| B     | Vicky    |  17    |
| B     | Emma     |  20    |
| B     | null     |   3    |
| B     | null     |  65    |
+-------+----------+++-------+

This is the default behavior of the data step merge, except that it won't set the last row's variable to missing - but it's easy to fudge.

There are other ways to do this, the best in my opinion being the hash object if you're comfortable with that.

data names;
infile datalines dlm='|';
input ID $ NAME $;
datalines;
| A     | Jack     |
| A     | Andy     |
| A     | Steve    |
| A     | Jay      |
| B     | Chris    |
| B     | Vicky    |
| B     | Emma     |
;;;;
run;

data ages;
infile datalines dlm='|';
input id $ age;
datalines;
| A     |     22 |
| A     |     31 |
| A     |     11 |
| B     |     40 |
| B     |     17 |
| B     |     20 |
| B     |      3 |
| B     |     65 |
;;;;
run;


data want;
  merge names(in=_a) ages(in=_b);
  by id;
  if _a;
  if name ne lag(name) then output;  *this assumes `name` is unique in id - if it is not we may have to do a bit more work here;
  call missing(age);  *clear age after output so we do not attempt to fill extra rows with the same age - age will be 'retain'ed;
run;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM