简体   繁体   English

unix和windows文件之间的差异

[英]Differences between unix and windows files

Am I correct in assuming that the only difference between "windows files" and "unix files" is the linebreak? 假设“windows文件”和“unix文件”之间的唯一区别是换行符,我是否正确?

We have a system that has been moved from a windows machine to a unix machine and are having troubles with the format. 我们有一个系统已从Windows机器移动到unix机器,并且格式有问题。

I need to automate the translation between unix/windows before the files get delivered to the system in our "transportsystem". 我需要在文件在“transportsystem”中传送到系统之前自动执行unix / windows之间的转换。 I'll probably need something to determine the current format and something to transform it into the other format. 我可能需要一些东西来确定当前的格式以及将其转换为其他格式的东西。 If it's just the newline thats the big difference then I'm considering just reading the files with the java.io. 如果它只是新行有很大的不同,那么我正在考虑用java.io读取文件。 As far as I know, they are able to handle both with readLine. 据我所知,他们可以使用readLine处理这两个问题。 And then just write each line back with 然后用它来写回每一行

while (line = readline)
    print(line + NewlineInOtherFormat)
....

Summary: 摘要:

samjudson : samjudson

This is only a difference in text files, where UNIX uses a single Line Feed (LF) to signify a new line, Windows uses a Carriage Return/Line Feed (CRLF) and Mac uses just a CR. 这只是文本文件的差异,其中UNIX使用单个换行符(LF)来表示新行,Windows使用回车符/换行符(CRLF),而Mac仅使用CR。

to which Cebjyre elaborates: Cebjyre详细阐述:

OS X uses LF, the same as UNIX - MacOS 9 and below did use CR though OS X使用LF,与UNIX相同 - MacOS 9及以下版本确实使用CR

Mo

There could also be a difference in character encoding for national characters. 国家字符的字符编码也可能有所不同。 There is no "unix-encoding" but many linux-variants use UTF-8 as the default encoding. 没有“unix-encoding”,但许多linux-variants使用UTF-8作为默认编码。 Mac OS (which is also a unix) uses its own encoding (macroman). Mac OS(也是一个unix)使用自己的编码(macroman)。 I am not sure, what windows default encoding is. 我不确定,Windows默认编码是什么。

McDowell 麦克道尔

In addition to the new-line differences, the byte-order mark can cause problems if files are treated as Unicode on Windows. 除了新行差异之外,如果文件在Windows上被视为Unicode,则字节顺序标记可能会导致问题。

Cheekysoft Cheekysoft

However, another set of problems that you may come across can be related to single/multi-byte character encodings. 但是,您可能遇到的另一组问题可能与单/多字节字符编码有关。 If you see strange unexpected chars (not at end-of-line) then this could be the reason. 如果你看到奇怪的意外字符(不在行尾)那么这可能是原因。 Especially if you see square boxes, question marks, upside-down question marks, extra characters or unexpected accented characters. 特别是如果您看到方框,问号,颠倒的问号,额外的字符或意外的重音字符。

Sadie 塞迪

On unix, files that start with a . 在unix上,以a开头的文件。 are hidden. 是隐藏的。 On windows, it's a filesystem flag that you probably don't have easy access to. 在Windows上,它是一个您可能无法轻松访问的文件系统标志。 This may result in files that are supposed to be hidden now becoming visible on the client machines. 这可能导致现在应该隐藏的文件在客户端计算机上变得可见。

File permissions vary between the two. 文件权限因二者而异。 You will probably find, when you copy files onto a unix system, that the files now belong to the user that did the copying and have limited rights. 当您将文件复制到unix系统时,您可能会发现文件现在属于进行复制并具有有限权限的用户。 You'll need to use chown/chmod to make sure the correct users have access to them. 您需要使用chown / chmod来确保正确的用户可以访问它们。

There exists tools to help with the problem: 存在帮助解决问题的工具:

pauldoo pauldoo

If you are just interested in the content of text files, then yes the line endings are different. 如果您只对文本文件的内容感兴趣,那么行结尾是不同的。 Take a look at something like dos2unix, it may be of help here. 看看像dos2unix这样的东西,它可能在这里有所帮助。

Cheekysoft Cheekysoft

As pauldoo suggests, tools like dos2unix can be very useful. 正如pauldoo所说,像dos2unix这样的工具非常有用。 Note that these may be on your linux/unix system as fromdos or tofrodos, or perhaps even as the general purpose toolbox recode. 请注意,这些可能在您的linux / unix系统上,如fromdos或tofrodos,或者甚至可能作为通用工具箱重新编码。

Help for java coding 帮助java编码

Cheekysoft Cheekysoft

When writing to files or reading from files (that you are in control of), it is often worth specifying the encoding to use, as most Java methods allow this. 当写入文件或从文件中读取(您可以控制)时,通常值得指定要使用的编码,因为大多数Java方法都允许这样做。 However, also ensuring that the system locale matches can save a lot of pain 但是,确保系统区域设置匹配可以节省很多痛苦

This is only a difference in text files, where UNIX uses a single Line Feed (LF) to signify a new line, Windows uses a Carriage Return/Line Feed (CRLF) and Mac uses just a CR. 这只是文本文件的差异,其中UNIX使用单个换行符(LF)来表示新行,Windows使用回车符/换行符(CRLF),而Mac仅使用CR。

Binary files there should be no difference (ie a JPEG on a windows machine will be byte for byte the same as the same JPEG on a unix box.) 二进制文件应该没有区别(即Windows机器上的JPEG将是字节,字节与unix盒上的相同JPEG相同。)

There could also be a difference in character encoding for national characters. 国家字符的字符编码也可能有所不同。 There is no "unix-encoding" but many linux-variants use UTF-8 as the default encoding. 没有“unix-encoding”,但许多linux-variants使用UTF-8作为默认编码。 Mac OS (which is also a unix) uses its own encoding (macroman). Mac OS(也是一个unix)使用自己的编码(macroman)。 I am not sure, what windows default encoding is. 我不确定,Windows默认编码是什么。

But this could be another source of trouble (apart from the different linebreaks). 但这可能是另一个麻烦的来源(除了不同的阵容)。

What are your problems? 你有什么问题? The linebreak-related problems can be easily corrected with the programs dos2unix or unix2dos on the unix-machine 使用unix-machine上的dos2unix或unix2dos程序可以轻松纠正与换行相关的问题

If you are just interested in the content of text files, then yes the line endings are different. 如果您只对文本文件的内容感兴趣,那么行结尾是不同的。 Take a look at something like dos2unix , it may be of help here. 看看像dos2unix这样的东西,它可能在这里有所帮助。

(Of course there are many other things that make unix and windows files different, but I don't think you're interested in those other differences right now.) (当然还有许多其他因素使unix和windows文件不同,但我认为你现在对其他差异不感兴趣。)

In addition to the answers given, you may find issues with the different file systems: 除了给出的答案,您可能会发现不同文件系统的问题:

  • On unix, files that start with a . 在unix上,以a开头的文件 are hidden. 是隐藏的。 On windows, it's a filesystem flag that you probably don't have easy access to. 在Windows上,它是一个您可能无法轻松访问的文件系统标志。 This may result in files that are supposed to be hidden now becoming visible on the client machines. 这可能导致现在应该隐藏的文件在客户端计算机上变得可见。

  • File permissions vary between the two. 文件权限因二者而异。 You will probably find, when you copy files onto a unix system, that the files now belong to the user that did the copying and have limited rights. 当您将文件复制到unix系统时,您可能会发现文件现在属于进行复制并具有有限权限的用户。 You'll need to use chown/chmod to make sure the correct users have access to them. 您需要使用chown / chmod来确保正确的用户可以访问它们。

除了新行差异之外,如果文件在Windows上被视为Unicode,则字节顺序标记可能会导致问题。

As pauldoo suggests, tools like dos2unix can be very useful. 正如pauldoo所说,像dos2unix这样的工具非常有用。 Note that these may be on your linux/unix system as fromdos or tofrodos , or perhaps even as the general purpose toolbox recode . 请注意,这些可能在您的linux / unix系统上,如fromdostofrodos ,或者甚至可能作为通用工具箱重新编码

However, another set of problems that you may come across can be related to single/multi-byte character encodings. 但是,您可能遇到的另一组问题可能与单/多字节字符编码有关。 If you see strange unexpected chars (not at end-of-line) then this could be the reason. 如果你看到奇怪的意外字符(不在行尾)那么这可能是原因。 Especially if you see square boxes, question marks, upside-down question marks, extra characters or unexpected accented characters. 特别是如果您看到方框,问号,颠倒的问号,额外的字符或意外的重音字符。

Running the command locale on your *nix box will tell you what the system locale is. 在* nix框上运行命令区域设置将告诉您系统区域设置是什么。 If this is different to the encoding used in the text files that have been transferred over from the windows machine, then this can sometimes cause issues, depending on the usage of those files. 如果这与从Windows计算机传输的文本文件中使用的编码不同,则这有时会导致问题,具体取决于这些文件的用法。 You can use the very powerful recode command to try and convert between the different charsets as well as any line ending issues. 您可以使用非常强大的recode命令尝试在不同的字符集之间进行转换以及任何行结束问题。 recode -l will show you all of the formats and encodings that the tool can convert between. recode -l将显示该工具可以在其间转换的所有格式和编码。 It is likely to be a VERY long list. 它可能是一个非常长的列表。

When writing to files or reading from files (that you are in control of), it is often worth specifying the encoding to use, as most Java methods allow this. 当写入文件或从文件中读取(您可以控制)时,通常值得指定要使用的编码,因为大多数Java方法都允许这样做。 However, also ensuring that the system locale matches can save a lot of pain. 但是,确保系统区域设置匹配可以节省很多痛苦。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM