简体   繁体   English

合并/合并两个表,删除新的重复项[SAS中的PROC SQL]

[英]Join / Merge two tables removing new duplicates [PROC SQL in SAS]

similar questions have been asked on the forums but I seem to have a unique issue with mine. 在论坛上也曾提出过类似的问题,但我的问题似乎很独特。 I'm not sure if this is because I don't have a unique ID or because my KEY is my actual data. 我不确定这是因为我没有唯一的ID还是因为我的KEY是我的实际数据。 I hope you guys can help. 希望你们能帮上忙。

I am trying to merge two tables (Old and New) that have identical column structures. 我正在尝试合并具有相同列结构的两个表(旧表和新表)。

I want to retain all my values in the Old table and append ONLY new variables from New Table into a Combined Table. 我想将所有值保留在旧表中,并且仅将新表中的新变量附加到组合表中。 Any keys that exist in both tables should take on the value of the Old table. 两个表中都存在的任何键都应采用旧表的值。

OLD TABLE
Key | Points
AAA | 1
BBB | 2
CCC | 3

NEW TABLE
Key | Points
AAA | 2
BBB | 5
CCC | 8
DDD | 6

Combined TABLE
Key | Points
AAA | 1
BBB | 2
CCC | 3
DDD | 6

I feel like what I want to achieve is the venn diagram equivalent of this: 我觉得我想要实现的是与之等效的维恩图:

Venn diagram 维恩图

... but for whatever reason I'm not getting the intended effect with this code: ...但是无论出于何种原因,我都无法通过此代码获得预期的效果:

CREATE TABLE Combined
SELECT * FROM Old as A
FULL OUTER JOIN New as B ON A.Key=B.Key
WHERE A.Key IS NULL OR B.Key IS NULL;

这可能对您有帮助。

SELECT B.[Key], MIN(CASE WHEN A.[Key] = B.[Key] THEN A.Points ELSE B.Points END) AS 'Points' FROM OldTable A CROSS APPLY NewTable B GROUP BY B.[Key]

As long as there are no duplicate values for key in either table: 只要两个表中的键都没有重复的值:

SELECT COALESCE(a.key,b.key) AS key, COALESCE(a.points,b.points) AS points
  FROM old a FULL OUTER JOIN new b ON a.key EQ b.key

Coalesce returns the first value if the first value is not missing, and returns the second value otherwise. 如果第一个值不丢失,则Coalesce返回第一个值,否则返回第二个值。

If you don't have duplicate keys within either table, then a simple update statement in a data step will do the job. 如果在任何一个表中都没有重复的键,那么在数据步骤中执行一个简单的update语句就可以完成任务。 You just need to make sure that NEW_TABLE is the first in the list, so the values in OLD_TABLE will replace those where the key matches. 您只需要确保NEW_TABLE是列表中的第一名,那么OLD_TABLE的值将替换键匹配的值。 Any keys unique to one table will be output automatically. 一个表独有的任何键都将自动输出。

Your data needs to be sorted by Key, as in your example. 如示例所示,您的数据需要按Key排序。

data OLD_TABLE;
input Key $ Points;
datalines;
AAA 1
BBB 2
CCC 3
;
run;

data NEW_TABLE;
input Key $ Points;
datalines;
AAA 2
BBB 5
CCC 8
DDD 6
;
run;

data want;
update new_table old_table;
by key;
run;

Order the datasets 订购数据集

proc sort data=old; 
    by key; 
run;
proc sort data=new; 
    by key; 
run;

Combine them with a data set with by, output only the first key if there is a match 将它们与带有by的数据集组合,如果匹配则仅输出第一个键

data combined;
set 
    old
    new;
by key;
if first.key then output;
run;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM