简体   繁体   English

sort:字符串比较失败无效或不完整的多字节或宽字符

[英]sort: string comparison failed Invalid or incomplete multibyte or wide character

I'm trying to use the following command on a text file: 我正在尝试在文本文件上使用以下命令:

$ sort <m.txt | uniq -c | sort -nr >m.dict 

However I get the following error message: 但是,我收到以下错误消息:

sort: string comparison failed: Invalid or incomplete multibyte or wide character
sort: Set LC_ALL='C' to work around the problem.
sort: The strings compared were ‘enwedig\r’ and ‘mwy\r’.

I'm using Cygwin on Windows 7 and was having trouble earlier editing m.txt to put each word within the file on a new line. 我在Windows 7上使用Cygwin,并且在编辑m.txt时遇到问题,将文件中的每个单词放在一个新行上。 Please see: 请参阅:

Using AWK to place each word in a text file on a new line 使用AWK将每个单词放在新行的文本文件中

I'm not sure if I'm getting these errors due to this, or because m.txt contains characters from the Welsh alphabet (When I was working with Welsh text in Python, I was required t change the encoding to 'Latin-1'). 我不确定我是否因此而收到这些错误,或者因为m.txt包含来自威尔士语字母的字符(当我在Python中使用威尔士文本时,我需要将编码更改为'Latin-1 “)。

I tried following the error message's advice and changing LC_ALL='C' however this has not helped. 我尝试按照错误消息的建议并更改LC_ALL ='C'然而这没有帮助。 Can anyone elaborate on the errors I'm receiving and provide any advice on how I might go about trying to solve this problem. 任何人都可以详细说明我收到的错误,并提供有关如何尝试解决此问题的任何建议。

UPDATE: 更新:

When trying dos2unix, errors were being displayed about invalid characters at certain lines. 尝试使用dos2unix时,会在某些行显示有关无效字符的错误。 It turns out these were not Welsh characters, but other strange characters (arrows etc). 事实证明这些不是威尔士人物,而是其他奇怪的人物(箭头等)。 I went through my text file removing these characters until I was able to use the dos2unix command without error. 我查看了删除这些字符的文本文件,直到我能够无错误地使用dos2unix命令。 However, after using the dos2unix command all the text was concatenated (no spaces/newlines or anything, whereas it should have been so that each word in the file was on a seperate line) I then used unix2dos and the text file was back to normal. 但是,在使用dos2unix命令后,所有文本都连接在一起(没有空格/换行符或任何内容,而应该是这样,文件中的每个单词都在单独的行上)然后我使用unix2dos并且文本文件恢复正常。 How can I each word on its own individual line and use the sort command without it giving me errors about '\\r' characters? 如何在每个单词的各个单词上使用sort命令而不会给出错误的'\\ r'字符?

I know it's an old question, but just running the command export LC_ALL='C' does the trick as described by sort: Set LC_ALL='C' to work around the problem. 我知道这是一个老问题,但只是运行命令export LC_ALL='C'就像sort: Set LC_ALL='C' to work around the problem.所描述的那样sort: Set LC_ALL='C' to work around the problem. .

Looks like a Windows line-ending related problem ( \\r\\n versus \\n ). 看起来像Windows行结束相关的问题( \\r\\n\\n )。 You can convert m.txt to Unix line-endings with 您可以将m.txt转换为Unix行尾

dos2unix m.txt

and then rerun your command. 然后重新运行您的命令。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM