从Windows到Linux的文件传输

Question

I am exporting data in a csv file using ssis. 我正在使用ssis将数据导出到csv文件中。 In my ssis package i compress the file in zip format and upload it on a linux server using sftp. 在我的sis软件包中，我以zip格式压缩文件，然后使用sftp将其上传到Linux服务器上。 The problem is that in the destination file system, the csv files include a ^M character which comes from the dos system. 问题在于，在目标文件系统中，csv文件包含来自DOS系统的^ M字符。

I found three solutions. 我找到了三种解决方案。

First i could set the sftp transfer mode to ascii and not zip the file (i later found out this is only supported by ftp). 首先，我可以将sftp传输模式设置为ascii而不是压缩文件（我后来发现这仅受ftp支持）。 Considering that my unzipped file is > 3Gb that is not efficient, the upload will take ages. 考虑到我解压缩后的文件> 3Gb效率不高，因此上传需要一段时间。
Secondly once transferred i could unzip the file and convert it using dos2unix utility, but again dos2unix is not installed and i am not authorized to install it to the target system. 其次，一旦传输，我可以解压缩文件并使用dos2unix实用程序将其转换，但是再次没有安装dos2unix，并且我无权将其安装到目标系统。
Finally i could use a unix editor like sed to remove ^M from the end of lines. 最后，我可以使用sed之类的unix编辑器从行尾删除^ M。 My file is consisted of more than 4 million lines and this would again take ages. 我的文件包含超过400万行，这又需要花费很多时间。

Q: Is there any way to preformat my file in ASCII using ssis, then zip and transfer? 问：有什么方法可以使用sis将ASCII文件预格式化，然后压缩并传输？

Answer 1

I didn't try it, but I thought you could do a CR+LF -> LF conversion just when outputing to the csv file. 我没有尝试过，但是我认为您可以在输出到csv文件时进行CR + LF-> LF转换。 I looked in this link here 我在这里看了这个链接

Scroll down to the section "Header row delimiter". 向下滚动到“标题行定界符”部分。 It seems that if you choose {LF} as a row delimiter, your resulting .zip file will show correctly in your linux box. 看来，如果选择{LF}作为行定界符，则生成的.zip文件将正确显示在Linux框中。

BTW, probably you know, but I have to mention that ^M is the representation of CR in a linux / unix box. 顺便说一句，也许您知道，但是我不得不提到^ M是Linux / Unix框中CR的表示。

BTW2, in most cases the ^M in linux is not a problem, just some annoying thing. 顺便说一句，在大多数情况下，Linux中的^ M不是问题，只是一些令人讨厌的事情。

I hope I could help! 希望我能帮上忙！

Answer 2

While searching on this issue i found a very useful links were they described the cause and possible resolutions of this issue: 在搜索此问题时，我发现了一个非常有用的链接，它们描述了此问题的原因和可能的解决方法：

How to remove CTRL-M (^M) characters from a file in Linux 如何在Linux中从文件中删除CTRL-M（^ M）字符
Why are special characters such as “carriage return” represented as “^M”? 为什么将诸如“回车”之类的特殊字符表示为“ ^ M”？

Cause 原因

File has been transferred between systems of different types with different newline conventions. 文件已在具有不同换行符约定的不同类型的系统之间传输。 For example, Windows-based text editors will have a special carriage return character (CR+LF) at the end of lines to denote a line return or newline, which will be displayed incorrectly in Linux (^M). 例如，基于Windows的文本编辑器在行尾将带有特殊的回车符（CR + LF），以表示行返回或换行符，这些字符在Linux（^ M）中将无法正确显示。 This can be difficult to spot, as some applications or programs may handle the foreign newline characters properly while others do not. 这可能很难发现，因为某些应用程序或程序可能会正确处理外来换行符，而其他应用程序或程序则无法。 Thus some services may crash or not respond correctly. 因此，某些服务可能会崩溃或无法正确响应。 Often times, this is because the file is created or perhaps even edited on a Microsoft Windows machine and then uploaded or transferred to a Linux server. 通常，这是因为文件是在Microsoft Windows计算机上创建或什至在文件中编辑，然后上传或传输到Linux服务器。 This typically occurs when a file is transferred from MS-DOS (or MS-Windows) without ASCII or text mode. 当从没有ASCII或文本模式的MS-DOS（或MS-Windows）传输文件时，通常会发生这种情况。

Possible resolutions 可能的解决方案

(1) Using dos2unix command （1）使用dos2unix命令

dos2unix includes utilities to convert text files with DOS or MAC line breaks to Unix line breaks and vice versa. dos2unix包括实用程序，可将带有DOS或MAC换行符的文本文件转换为Unix换行符，反之亦然。 It also includes conversion of UTF-16 to UTF-8. 它还包括将UTF-16转换为UTF-8。

You can use a similar command via Execute Process Task : 您可以通过Execute Process Task使用类似的命令：

dos2unix filename

(2) Data Flow Task （2）数据流任务

You can create a Data Flow task that transfer data from Flat File Source into a new Flat File Destination were both Flat File Connection mAnager has the same structure except the Row Delimiter property ( {CR}{LF} in Source , {LF} in destination) 您可以创建一个数据流任务，以将平面文件源中的数据传输到新的平面文件目标中，这两个平面文件连接mAnager具有相同的结构，但行定界符属性（Source中的{CR}{LF} {LF} ，Destination中的{LF} ）

Flat File Connection Manager Editor (Columns Page) 平面文件连接管理器编辑器（“列”页面）

(3) Using a Script Task - StreamReader/Writer （3）使用脚本任务-StreamReader / Writer

You can use a script task with a similar code: 您可以使用具有类似代码的脚本任务：

string data = null;
//Open and read the file
using (StreamReader srFileName = new StreamReader(FileName))
    {
        data = srFileName.ReadToEnd();
        data = data.Replace("\r\n","\n");
    }

using (StreamWriter swFileName = new StreamWriter(FileName))
    {
        swFileName.Write(data);
    }

Replacing LF with CRLF in text file 在文本文件中用CRLF替换LF

(4) Extract using unzip -a （4）使用解压缩-a提取

From the following unzip documentation : 从以下解压缩文档中：

-a -一种

convert text files. 转换文本文件。 Ordinarily all files are extracted exactly as they are stored (as ''binary'' files). 通常，所有文件的提取都与它们存储时完全相同（作为“二进制”文件）。 The -a option causes files identified by zip as text files (those with the 't' label in zipinfo listings, rather than 'b') to be automatically extracted as such, converting line endings, end-of-file characters and the character set itself as necessary. -a选项使被zip识别为文本文件的文件（在zipinfo列表中带有't'标签而不是'b'的文件）被自动提取，从而转换行尾，文件结尾字符和字符根据需要进行设置。 (For example, Unix files use line feeds (LFs) for end-of-line (EOL) and have no end-of-file (EOF) marker; Macintoshes use carriage returns (CRs) for EOLs; and most PC operating systems use CR+LF for EOLs and control-Z for EOF. In addition, IBM mainframes and the Michigan Terminal System use EBCDIC rather than the more common ASCII character set, and NT supports Unicode.) Note that zip's identification of text files is by no means perfect; （例如，Unix文件使用换行符（LF）来表示行尾（EOL），并且没有文件结尾（EOF）标记； Macintosh机使用回车符（CR）来表示EOL；大多数PC操作系统使用CR + LF用于EOL，control-Z用于EOF。此外，IBM大型机和密歇根终端系统使用EBCDIC而不是更常见的ASCII字符集，而NT支持Unicode。完善; some ''text'' files may actually be binary and vice versa. 一些“文本”文件实际上可能是二进制文件，反之亦然。 unzip therefore prints ''[text]'' or ''[binary]'' as a visual check for each file it extracts when using the -a option. 因此，对于使用-a选项提取的每个文件，unzip将打印“ [text]”或“ [binary]”作为对其进行目测的检查。 The -aa option forces all files to be extracted as text, regardless of the supposed file type. -aa选项强制所有文件提取为文本，无论假定的文件类型如何。 On VMS, see also -S. 在VMS上，另请参阅-S。

So you can use the following command to extract text files with changing line endings: 因此，您可以使用以下命令来提取具有变化的行尾的文本文件：

unzip -a filename

Credit to @jww comment 归功于@jww评论

从Windows到Linux的文件传输

问题描述

2 个解决方案

解决方案1
1 2019-02-04 15:17:30

解决方案2
1 已采纳 2019-02-04 21:45:51

Cause 原因

Possible resolutions 可能的解决方案

(1) Using dos2unix command （1）使用dos2unix命令

(2) Data Flow Task （2）数据流任务

(3) Using a Script Task - StreamReader/Writer （3）使用脚本任务-StreamReader / Writer

(4) Extract using unzip -a （4）使用解压缩-a提取

Other Useful links 其他有用的链接

从Windows到Linux的文件传输

问题描述

2 个解决方案

解决方案1 1 2019-02-04 15:17:30

解决方案2 1 已采纳 2019-02-04 21:45:51

Cause 原因

Possible resolutions 可能的解决方案

(1) Using dos2unix command （1）使用dos2unix命令

(2) Data Flow Task （2）数据流任务

(3) Using a Script Task - StreamReader/Writer （3）使用脚本任务-StreamReader / Writer

(4) Extract using unzip -a （4）使用解压缩-a提取

Other Useful links 其他有用的链接

解决方案1
1 2019-02-04 15:17:30

解决方案2
1 已采纳 2019-02-04 21:45:51