简体   繁体   English

通过不检测重复但有欺骗来分组。 奇怪 SQL 服务器 - Azure Synapse 数据库专用 SQL 池

[英]Group by not detecting duplicates but there are dupes. Strange SQL Server - Azure Synapse database dedicated SQL pool

I have encountered a strange (until I understand the logical reason) behaviour of group by in a SQL Server database.我在 SQL 服务器数据库中遇到group by一种奇怪的(直到我理解逻辑原因)行为。 There are many duplicates in a table, for which when I query I get duplicate rows but when I try to find all dupes using group by or row_number strategy I get 0 records.表中有很多重复项,当我查询时,我得到重复的行,但是当我尝试使用 group by 或row_number策略查找所有重复项时,我得到 0 条记录。 But when I add "Cast" to the group by / row_number I get correct list of duplicates.但是当我将“Cast”添加到组中时 / row_number 我得到了正确的重复项列表。

The datatype is nvarchar for all 3 keys.所有 3 个键的数据类型都是nvarchar
Can someone tell me why this is happening?有人能告诉我为什么会这样吗?

Added the query and its output添加了查询及其 output

在此处输入图像描述

select top 10 len(VBELN) len_vblen, len(MANDT) , len(posnr) ,  * from [SRC_SAP_R3].[LIPS] where VBELN = '6316785926'

select cast(MANDT as nvarchar) as "MANDT",cast(VBELN as nvarchar) as "VBELN" , cast(posnr as nvarchar) as "posnr", count(*) from [SRC_SAP_R3].[LIPS]
group by cast(MANDT as nvarchar),cast(VBELN as nvarchar)  , cast(posnr as nvarchar) 
having count(*)>1;

select cast(MANDT as varchar) as "MANDT",cast(VBELN as varchar) as "VBELN" , cast(posnr as varchar) as "posnr", count(*) from [SRC_SAP_R3].[LIPS]
group by cast(MANDT as varchar),cast(VBELN as varchar)  , cast(posnr as varchar) 
having count(*)>1;

select MANDT, VBELN ,posnr, count(1) from [SRC_SAP_R3].[LIPS]
group by MANDT, VBELN ,posnr
having count(1)>1;

I tried to repro this in Azure Synapse Analytics.我试图在 Azure Synapse Analytics 中重现这一点。 As @Martin Smith said the len() function will ignore the trailing spaces while computing the total length of the column.正如@Martin Smith所说, len() function 在计算列的总长度时将忽略尾随空格。 When I tried with datalength() function, the length of trailing spaces is also included.当我尝试使用datalength() function 时,尾随空格的长度也包括在内。 Below is the repro.下面是复制品。

  • A table is created with varchar column and one data is inserted with trail spaces and other data is without spaces.使用 varchar 列创建一个表,插入一个数据时使用尾部空格,而其他数据则没有空格。
 create  table SAP_TAB (VBELN varchar(100))
insert  into SAP_TAB values('500 ')
insert  into SAP_TAB values('500')
  • Then len() function, datalength() function is applied to data.然后len() function,datalength() function 应用于数据。 Also, casted the data as varchar and length function is applied to casted data.此外,将数据转换为 varchar 并将长度 function 应用于转换数据。 Below is the query.下面是查询。
select VBELN,len(VBELN) as [length_VBELN],
datalength(VBELN) as [data_length_VBELN],
len(cast(VBELN as  varchar(10))) as 
[length_varchar_casted_VBELN]
from sap_tab

Result结果

VBELN VBELN length_VBELN长度_VBELN data_length_VBELN数据长度_VBELN length_varchar_casted_VBELN length_varchar_casted_VBELN
500 500 3 3个 5 5个 3 3个
500 500 3 3个 3 3个 3 3个

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Azure 突触,SQL 池作为数据流中的源 - Azure synapse, SQL pool as a source in dataflow Synapse Lake 数据库视图在 SQL 池中不可用? - Synapse Lake database view not available in SQL Pool? ODBC 连接到 Synapse 专用 SQL 池数据库通过带有 pyodbc 的 spark Notebook 时出错 - Error in ODBC Connection to Synapse Dedicated SQL Pool DB via spark Notebook with pyodbc 无法使用给定凭据访问 Azure 专用 SQL 池 - Failed to access the Azure Dedicated SQL pool with the given credentials Synapse pyspark - 在 Azure SQL 数据库上执行存储过程? - Synapse pyspark - execute stored procedure on Azure SQL Database? Synapse 无服务器 SQL 池中的数据屏蔽 - Data masking in Synapse serverless SQL pool 我们可以从管道 azure 突触在无服务器池中执行 sql 查询吗? - can we execute sql query in serverless pool from pipeline azure synapse? 如何在 Azure Synapse Serverless 池中为 Lake 数据库创建视图 - How to create view in Azure Synapse Serverless pool for a Lake Database Azure Synapse SQL 具有多种环境的 CICD - Azure Synapse SQL CICD with multiple environments ADF 无法连接到 Synapse Link SQL 池外部表 - ADF Unable to connect to Synapse Link SQL Pool External Tables
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM