简体   繁体   English

如何识别由于.txt文件中隐藏的#而导致列中数据丢失的行

[英]How to identify the rows with missing data in the column due to hidden # in the .txt file

I have a below .txt files exported from the source system. 我有一个下面的.txt文件,从源系统导出。 Due to the # in one field in source system while exporting the .txt file some of the data after # fields do not have any data in the .txt file when exported. 由于#在源系统中的一个领域,而导出的.txt文件后的一些数据#字段出口时不必在.txt文件的任何数据。

For example below.. 例如下面

LINE|PANO| INOW|DEL|EASLN|EBSAP|LIM1IT|NOMIT|VALUE|KTE1|
1|7870|1000000||40500369|10|25624.0||0.00|SERVI TORNG|33277|
2|294|1000000||500324|10|590.84 ||0.00|REFUDIAL GATNGWAM|30448|
3|9410|1000000||200500325|10|5905.61||0.00|SUPLIVER EXTRACNS|37478|
4|573|1000000||600004075|10||||||||
5|739|1000000||700500290|10|40917.37|||||||
6|741|1000000||50500289|10|2782.53 ||0.00|SECUERVIC LUWE|29161|
7|948|1000000||||||||||||
8|996|1000000||960050035|10|7497.3||0.00|SCOUOUT URBISH IDM647 |38271|
9|1320|1000000||800500319|10|1395.93||0.00|TUATO AIRS|36427|
10|12054|1000000||9000287|10|458.42||0.00|SECURICE GOLA|||||

In the above example line 4, 5, 7 and 10 data is missing after certain fields due to the # in the source system field. 在上面的示例中,由于源系统字段中的# ,某些字段之后缺少第4、5、7和10行数据。 But there is data in the source system for these line items. 但是源系统中有这些订单项的数据。

How to recognize these line items as the missing information / records issue, if I have a large volume of .txt file for 10 Million-line items. 如果我有大量的.txt文件用于1000万个订单项,则如何将这些订单项识别为缺少的信息/记录问题。

Please kindly share the SQL query/ any other way to identify these line items with the missing data. 请与其他人共享SQL查询/以其他方式来识别这些订单项中缺少的数据。

another example 另一个例子

LINE|PANO| INOW|DEL|EASLN|EBSAP|LIM1IT|NOMIT|VALUE|KTE1|
1|7870|1000000||40500369|10|25624.0||0.00|SERVI TORNG|33277|
2|294|1000000||500324|10|590.84 ||0.00|REFUDIAL GATNGWAM|30448|
3|9410|1000000||200500325|10|5905.61||0.00|SUPLIVER EXTRACNS|37478|
4|573|1000000||600004075|10
5|739|1000000||700500290|10|40917.37
6|741|1000000||50500289|10|2782.53 ||0.00|SECUERVIC LUWE|29161|
7|948|1000000
8|996|1000000||960050035|10|7497.3||0.00|SCOUOUT URBISH IDM647 |38271|
9|1320|1000000||800500319|10|1395.93||0.00|TUATO AIRS|36427|
10|12054|1000000||9000287|10|458.42||0.00|SECURICE GOLA

data truncated if # exists. 如果#存在,则数据将被截断。

Would the following do what you require? 以下内容将满足您的要求吗?

I created a temporary table #HiddenHash and populated it with some of your example data, you will obviously have the data from a BULK INSERT or whatever mechanism you are using. 我创建了一个临时表#HiddenHash,并用您的一些示例数据填充了该表,您显然将从BULK INSERT或使用的任何机制中获取数据。

CREATE TABLE 
#HiddenHash
(
LINE VARCHAR (2)
,PANO VARCHAR (25) 
,INOW VARCHAR (25)
,DEL VARCHAR (25)
,EASLN VARCHAR (25)
,EBSAP VARCHAR (25)
,LIM1IT VARCHAR (25)
,NOMIT VARCHAR (25)
,VALUE VARCHAR (25)   
,KTE1 VARCHAR (25)
)

INSERT INTO #HiddenHash
VALUES
('1','7870','1000000','','40500369','10','25624.0','0.00','SERVI TORNG','33277')
,('2','294','1000000','',' 500324','10','590.84 ','0.00','REFUDIAL GATNGWAM','30448')
,('3','9410','1000000','','200500325','10','5905.61','0.00','SUPLIVER EXTRACNS','37478')
,('4','573','1000000','','600004075','10','','','','')
,('5','739','1000000','','700500290','10','40917.37','','','')
,('6','741','1000000','','50500289','10','2782.53 ','0.00','SECUERVIC LUWE','29161')
,('7','948','1000000','','','','','','','')
,('8','996','1000000','','960050035','10','7497.3','0.00','SCOUOUT URBISH IDM647 ','38271')
,('9','1320','1000000','','800500319','10','1395.93','0.00','TUATO AIRS','36427')
,('10','12054','1000000','','9000287','10','458.42','0.00','SECURICE GOLA','')

Then I count how many columns there are in the table. 然后,我计算表中有多少列。

    DECLARE @CountColumns INT


    SET @CountColumns = (SELECT COUNT (*) 
                        FROM TEMPDB.SYS.COLUMNS
                        WHERE NAME <> 'DEL' AND
                         object_id = object_id('tempdb.dbo.#HiddenHash')
                         )

Then count those rows where the columns are blank and show those where they do not match the number of columns contained in the variable. 然后计算那些列为空白的行,并显示那些与变量中包含的列数不匹配的行。

    SELECT LINE,PANO,INOW,EASLN,EBSAP,LIM1IT,NOMIT,VALUE,KTE1 
       FROM (
            SELECT 
            LINE,PANO,INOW,EASLN,EBSAP,LIM1IT,NOMIT,VALUE,KTE1, 
            (
            SELECT COUNT(*) 
            FROM (VALUES (LINE),(PANO),(INOW),(EASLN),(EBSAP),(LIM1IT),(NOMIT), 
           (VALUE),(KTE1)) AS Cnt(col) 
        WHERE Cnt.Col <> ''
        ) AS NotBlank
    FROM #HiddenHash)cc
    WHERE cc.NotBlank <> @CountColumns

Which gives the following result 得到以下结果

LINE    PANO    INOW    EASLN       EBSAP   LIM1IT   NOMIT  VALUE       KTE1
4       573     1000000 600004075   10              
5       739     1000000 700500290   10      40917.37            
7       948     1000000                     
10      12054   1000000 9000287     10       458.42  0.00   SECURICE GOLA   

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM