sort：字符串比较失败无效或不完整的多字节或宽字符

Question

I'm trying to use the following command on a text file: 我正在尝试在文本文件上使用以下命令：

$ sort <m.txt | uniq -c | sort -nr >m.dict

However I get the following error message: 但是，我收到以下错误消息：

sort: string comparison failed: Invalid or incomplete multibyte or wide character
sort: Set LC_ALL='C' to work around the problem.
sort: The strings compared were ‘enwedig\r’ and ‘mwy\r’.

I'm using Cygwin on Windows 7 and was having trouble earlier editing m.txt to put each word within the file on a new line. 我在Windows 7上使用Cygwin，并且在编辑m.txt时遇到问题，将文件中的每个单词放在一个新行上。 Please see: 请参阅：

Using AWK to place each word in a text file on a new line 使用AWK将每个单词放在新行的文本文件中

I'm not sure if I'm getting these errors due to this, or because m.txt contains characters from the Welsh alphabet (When I was working with Welsh text in Python, I was required t change the encoding to 'Latin-1'). 我不确定我是否因此而收到这些错误，或者因为m.txt包含来自威尔士语字母的字符（当我在Python中使用威尔士文本时，我需要将编码更改为'Latin-1 “）。

I tried following the error message's advice and changing LC_ALL='C' however this has not helped. 我尝试按照错误消息的建议并更改LC_ALL ='C'然而这没有帮助。 Can anyone elaborate on the errors I'm receiving and provide any advice on how I might go about trying to solve this problem. 任何人都可以详细说明我收到的错误，并提供有关如何尝试解决此问题的任何建议。

UPDATE: 更新：

When trying dos2unix, errors were being displayed about invalid characters at certain lines. 尝试使用dos2unix时，会在某些行显示有关无效字符的错误。 It turns out these were not Welsh characters, but other strange characters (arrows etc). 事实证明这些不是威尔士人物，而是其他奇怪的人物（箭头等）。 I went through my text file removing these characters until I was able to use the dos2unix command without error. 我查看了删除这些字符的文本文件，直到我能够无错误地使用dos2unix命令。 However, after using the dos2unix command all the text was concatenated (no spaces/newlines or anything, whereas it should have been so that each word in the file was on a seperate line) I then used unix2dos and the text file was back to normal. 但是，在使用dos2unix命令后，所有文本都连接在一起（没有空格/换行符或任何内容，而应该是这样，文件中的每个单词都在单独的行上）然后我使用unix2dos并且文本文件恢复正常。 How can I each word on its own individual line and use the sort command without it giving me errors about '\\r' characters? 如何在每个单词的各个单词上使用sort命令而不会给出错误的'\\ r'字符？

Answer 1

I know it's an old question, but just running the command export LC_ALL='C' does the trick as described by sort: Set LC_ALL='C' to work around the problem. 我知道这是一个老问题，但只是运行命令export LC_ALL='C'就像sort: Set LC_ALL='C' to work around the problem.所描述的那样sort: Set LC_ALL='C' to work around the problem. . 。

Answer 2

Looks like a Windows line-ending related problem ( \\r\\n versus \\n ). 看起来像Windows行结束相关的问题（ \\r\\n与\\n ）。 You can convert m.txt to Unix line-endings with 您可以将m.txt转换为Unix行尾

dos2unix m.txt

and then rerun your command. 然后重新运行您的命令。

sort：字符串比较失败无效或不完整的多字节或宽字符

问题描述

2 个解决方案

解决方案1
2 2017-01-31 09:28:37

解决方案2
1 2016-03-29 19:47:25

sort：字符串比较失败无效或不完整的多字节或宽字符

问题描述

2 个解决方案

解决方案1 2 2017-01-31 09:28:37

解决方案2 1 2016-03-29 19:47:25

解决方案1
2 2017-01-31 09:28:37

解决方案2
1 2016-03-29 19:47:25