简体   繁体   English

SSIS平面文件导入中的行尾不一致

[英]Inconsistent line endings in SSIS Flat File import

I have a large, pipe delineated text file with no text qualifiers, and it looks like whatever spit out this file accidentally spit out false "LF" markers in the last column every few hundred rows. 我有一个很大的,管道定界的文本文件,没有文本限定符,看起来像是随便吐出这个文件,每隔几百行在最后一列中吐出了错误的“ LF”标记。 The last column is a descriptive column, and It is not text qualified in any way like it should be. 最后一栏是描述性栏,它不是以任何应有的方式限定文本的。 file looks similar to this: 文件看起来与此类似:

id|data|data|data|data|Description[LF]
id|data|data|data|data|Description[LF]
id|data|data|data|data|Description[LF]
id|data|data|data|data|Descr[LF]
iption[LF]
id|data|data|data|data|Description[LF]
Id|data|data|data|data|Description[LF]
id|data|data|data|data|Descripti[LF]
on[LF]
id|data|data|data|data|Description[LF]
id|data|data|data|data|Description[LF]
id|data|data|data|data|Description[LF]
id|data|data|data|data|Description[LF]
id|data|data|data|data|D[LF]
escription[LF]

I'm pretty new to SSIS and SQL in general, Does anyone have any advice on how to fix this? 一般来说,我对SSIS和SQL还是很陌生,是否有人对如何解决此问题有任何建议?

I did actually find a way to fix it in Notepad++, because I don't know C# and I don't know SSIS well enough.. 我确实找到了一种在Notepad ++中修复它的方法,因为我不了解C#,也不太了解SSIS。

The ID was 8 Digits long, and followed by 7 Blank spaces. ID为8位数字,后跟7个空格。 That was absolutely unique to this file. 这对于该文件绝对是唯一的。

In notepad++ I used (Find Extended) to search and replace "\\n"(LF) with nothing 在记事本++中,我使用(查找扩展名)来搜索和替换“ \\ n”(LF)为空

then I used the this expression for find: 然后我使用以下表达式查找:

(\d\d\d\d\d\d\d\d[[:blank:]][[:blank:]][[:blank:]][[:blank:]][[:blank:]][[:blank:]][[:blank:]])

to find all 8 digit numbers with 7 trailing spaces, and for replace, used this: 查找具有7个尾随空格的所有8位数字,并使用以下命令进行替换:

\r\n\1

to put a [CR][LF] in front of those 8 digit numbers. 在这8位数字前放置[CR] [LF]。

Lo and behold it worked! 瞧,这行得通! But either way.. My boss contacted the client and is requesting a better file. 但是无论哪种方式..我的老板联系了客户,并要求提供更好的文件。 Now I get kudos, and we get proper data. 现在,我得到了荣誉,我们得到了正确的数据。 Thanks for the advice all! 谢谢你的建议!

If I had to take a guess, I would say that this is occurring because of how the file is created... you are probably having data that just happens to include certain special characters which are being incorrectly interpreted as a Line Feed. 如果我不得不猜测的话,那是因为文件的创建方式而发生的……您可能拥有的数据恰好包含某些特殊字符,这些特殊字符被错误地解释为换行符。

Check this site to see if the data within your problem lines match any of these encodings. 检查此站点以查看问题行中的数据是否与任何这些编码匹配。 If this is the case then ultimately you have two options available: 如果是这种情况,那么最终您可以有两个选择:

1) Create some elaborate and complicated ETL process to detect and correct the file data before you process it. 1)创建一些复杂而复杂的ETL过程,以在处理文件数据之前对其进行检测和更正。 This is inadvisable as it will be a major pain to create and maintain. 这是不可取的,因为这将是创建和维护的主要难题。

2) Try changing the way this file is produced. 2)尝试更改此文件的生成方式。 Most text export wizards will allow you to place quotes (") around text items so that your import process can quickly detect something as a text block as opposed to a series of encoded characters to interpret. 大多数文本导出向导将允许您在文本项周围放置引号(“),以便导入过程可以快速检测到某些内容作为文本块,而不是要解释的一系列编码字符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM