简体   繁体   English

SAS中的nodupkey和SQL中的SELECT * DISTINCT FROM table_name之间的区别

[英]Difference between nodupkey in SAS and SELECT * DISTINCT FROM table_name in SQL

I have a data set with 2 fields storing Strings. 我有一个包含2个存储字符串的字段的数据集。 1.In SAS when I do a nodupkey on the dataset I get ~200 records. 1,在SAS上对数据集执行nodupkey时,我获得了约200条记录。 2.In SQL when I do a SELECT DISTINCT / GROUP BY/ PARTITION BY I am getting ~2000 records. 2.在SQL中,当我执行SELECT DISTINCT / GROUP BY / PARTITION BY时,我获得了约2000条记录。 This SQL code is run on HIVE which is hosted on an AWS EMR server. 该SQL代码在AWS EMR服务器上托管的HIVE上运行。

The data set I am working on has NULL in some of the records for on of the fields. 我正在处理的数据集在其中某些字段的某些记录中为NULL。 I am not doing anything else apart from what I mentioned in point 1 and 2. 除了我在第1点和第2点中提到的内容之外,我没有做任何其他事情。

I am looking for explanation as to why there is a huge mismatch between these 2 when I am doing just a simple duplicate removal. 我正在寻找有关为什么当我仅执行简单的重复删除操作时这两个之间存在巨大不匹配的解释。

Distinct operates on all fields in select statement and the database will likely consider nulls and blanks as different. Distinct对select语句中的所有字段进行操作,并且数据库可能会将null和空白视为不同。 SAS does not consider nulls and blanks as different and only filters based on the variables listed in the BY statement. SAS不会将空值和空格视为不同,而是仅根据BY语句中列出的变量进行过滤。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 SAS在proc sql vs proc中排除了nodupkey - SAS distinct in proc sql vs proc sort nodupkey 如何避免使用“SELECT * FROM {table_name}”进行 SQL 注入? - How to avoid SQL injection with “SELECT * FROM {table_name}”? SQL 服务器如何 Select * from table_name where in (@parameter) - SQL Server How to Select * from table_name where in (@parameter ) SQL:从@variable 中选择表的表值,例如'Select * from @Table_name' where @table_name=[DB_Name].[Schema_name].[table_name] - SQL : selecting the table values of table from @variable like 'Select * from @Table_name' where @table_name=[DB_Name].[Schema_name].[table_name] 'select * from [table_name]'是一个游标吗? - 'select * from [table_name]' is secretly a cursor? SQL:如何使用信息架构中的Table_Name和Pivoted Column_Name构建选择查询 - SQL: How to build a select query with Table_Name and Pivoted Column_Name from Information Schema select_from(table_name)中的table_name是什么类型? - What is the type of table_name in select_from(table_name)? Oracle:select * from(select table_name from ...)? - Oracle: select * from (select table_name from … )? 相当于“SELECT * FROM(SELECT table_name FROM ...)”的东西? - Something equivalent to “SELECT * FROM (SELECT table_name FROM…)”? SELECT * FROM table_name ORDER BY column_name? - SELECT * FROM table_name ORDER BY column_name?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM