简体繁体 English

Intellij IDEA：无法提交文件：'utf8'编解码器无法解码位置9的字节0xcc

[英]Intellij IDEA: Impossible to commit files: 'utf8' codec can't decode byte 0xcc in position 9

原文 2014-12-08 10:49:46 2 3 java/ intellij-idea/ mercurial/ bitbucket

IntelliJ IDEA 14.0.1 IntelliJ IDEA 14.0.1

Plugin: jetbrains-bitbucket-connector 插件： jetbrains-bitbucket-connector

I'm trying to commit files, but get the error: 我正在尝试提交文件，但得到错误：

Error:transaction abort! 错误：事务中止！

rollback completed abort: 回滚完成中止：

decoding near 'C:\\Users\\ \\AppDa': 'utf8' codec can't decode byte 0xcc in position 9: invalid continuation byte! 在'C：\\ Users \\ \\ AppDa'附近解码：'utf8'编解码器无法解码位置9的字节0xcc：无效的连续字节！

Has anyone encountered this error? 有没有人遇到过这个错误？ How can it be solved? 怎么解决？

Thanks. 谢谢。

3 个解决方案

This probably isn't the answer which you're looking for but gives you some insight on what might be going on: 这可能不是您正在寻找的答案，但可以让您了解可能发生的事情：

On most systems, file paths are made up from bytes since file systems were designed decades before Unicode. 在大多数系统上，文件路径由字节组成，因为文件系统是在Unicode之前数十年设计的。 Unicode is retrofitted to them by interpreting the bytes as UTF-8 encoded strings. 通过将字节解释为UTF-8编码的字符串来对Unicode进行改进。 Unfortunately, there is no way to say "this is Cp-1251" and "this is UTF-8" inside of a file name. 不幸的是，在文件名中没有办法说“这是Cp-1251”和“这是UTF-8”。 Therefore, the "convert file name to string" code relies on the platform's default encoding. 因此，“将文件名转换为字符串”代码依赖于平台的默认编码。 NTFS solved the problem by always storing file names as Unicode (ignoring the local code page) but the names are translated into the local code page when you use a tool which displays them on screen. NTFS通过始终将文件名存储为Unicode（忽略本地代码页）来解决问题，但是当您使用在屏幕上显示它们的工具时，名称将被转换为本地代码页。

And then comes Python 2 where Unicode was also retrofitted in a similar way. 然后是Python 2，其中Unicode也以类似的方式进行了改进。 Python just has the advantage that you have two types of objects ( str and unicode ) so in theory, you can tell raw bytes and Unicode apart. Python的优点是你有两种类型的对象（ str和unicode ），所以从理论上讲，你可以区分原始字节和Unicode。 The problems start when you get a bunch of bytes from somewhere and the logic says "this should be Unicode" - which happens when you read file names from disk. 当你从某个地方获得一堆字节并且逻辑上写着“这应该是Unicode”时会出现问题 - 当你从磁盘读取文件名时会发生这种情况。

In your case, the file system passes bytes which contain Cp1251 encoded characters to Python but the Python code tries to read them as UTF-8 encoded Unicode. 在您的情况下，文件系统将包含Cp1251编码字符的字节传递给Python，但Python代码尝试将它们作为UTF-8编码的Unicode读取。 For many characters (< code point 128), this works but it breaks for everything with a code point > 128. \\xCC is a common case here since UTF-8 uses this byte to encode all code points between 128 and 256. This is why you see this error so often in Europe - we use those characters a lot. 对于许多字符（<代码点128），这可以工作但是它会因代码点> 128而中断。 \\xCC是一种常见的情况，因为UTF-8使用这个字节来编码128到256之间的所有代码点。这是为什么你经常在欧洲看到这个错误 - 我们经常使用这些角色。

Now the people who created Mercurial are well aware of all this. 现在，创建Mercurial的人非常了解这一切。 Most of the time, Mercurial should just work. 大多数时候，Mercurial应该工作。 See https://www.mercurial-scm.org/pipermail/mercurial/2009-January/023762.html 请参阅https://www.mercurial-scm.org/pipermail/mercurial/2009-January/023762.html

As I see it, your problem could be caused by: 在我看来，你的问题可能是由以下原因引起的：

Somehow, Windows used the local code page to create your home directory (unlikely) 不知何故，Windows使用本地代码页来创建您的主目录（不太可能）
Mercurial gets the path as Unicode but for some reason, it thinks that the string is raw bytes and tried to decode using a UTF-8 decoder. Mercurial将路径视为Unicode，但出于某种原因，它认为字符串是原始字节并尝试使用UTF-8解码器进行解码。 Since the decoding is applied twice, this fails. 由于解码被应用两次，因此失败。 Maybe you have an old version of Mercurial. 也许你有一个旧版本的Mercurial。 Try to update. 尝试更新。
Maybe you showed us the wrong part of the error message and the problem is actually in a file which you tried to commit. 也许您向我们展示了错误消息的错误部分，问题实际上是您尝试提交的文件。 In that case, we can ignore the odd characters in the error message. 在这种情况下，我们可以忽略错误消息中的奇数字符。 Make sure you use the correct encoding when you edit the file. 编辑文件时，请确保使用正确的编码。

To see which one it is, I suggest to create a folder C:\\dev and work there. 要查看它是哪一个，我建议创建一个文件夹C:\\dev并在那里工作。 If this works, then there is something wrong with your home folder or Mercurial has a bug. 如果这样可行，那么您的主文件夹有问题或Mercurial有错误。

Error is saying that in your file location path there are few character which is not present in the utf-8 character set so the decoder is not able to decode the given file path and it is aborting the operation. 错误是说在文件位置路径中， utf-8 字符集中不存在的字符很少，因此解码器无法解码给定的文件路径并且正在中止操作。

see the characters in the location path and correct it if there are any unknown character present in that 查看位置路径中的字符，如果其中存在任何未知字符，请更正它

'C:\\Users\\ \\AppDa' 'C：\\用户\\\\ AppDa'

here the are showing that these characters are not able to decode by utf-8. 这里显示这些字符无法通过utf-8解码。

edit: Check your string with this tool to see in which encoding your character set is encoded. 编辑：使用此工具检查字符串，以查看字符集编码的编码。 link to tool 链接到工具

then you can use that encoder but this is not a practical solution use utf-16 char set it is having large character set, and it vary by platform and language. 那么你可以使用那个编码器，但这不是一个实用的解决方案，使用utf-16字符集，它具有大字符集，并且它因平台和语言而异。

I had the same problem, if you use Mercurial also, then here is the solution: 我有同样的问题，如果你也使用Mercurial，那么这里是解决方案：

go to [project directory]/.hg 转到[项目目录] / .hg
open "hgrc" file 打开“hgrc”文件
below the [ui] insert username = my_name_only_utf_characters <mail@example.com> 在[ui]下面插入username = my_name_only_utf_characters <mail@example.com>
save & commit 保存并提交