简体   繁体   English

如何使用TCL的exec复制其名称中带有特殊字符的文件?

[英]How to copy files with special characters in their names with TCL's exec?

I'm trying to upload files containing special characters on our platform via the exec command but the characters are always interpreted and it fails. 我正在尝试通过exec命令在我们的平台上传包含特殊字符的文件,但字符总是被解释而且失败。

For example if I try to upload a mémo.txt file I get the following error: 例如,如果我尝试上传mémo.txt文件, 则会收到以下错误:

/bin/cp: cannot create regular file `/path/to/dir/ m\\351mo.txt ': No such file or directory / bin / cp:无法创建常规文件`/ path / to / dir / m \\ 351mo.txt ':没有这样的文件或目录

The UTF8 is correctly configured on the system and if I run the command on the shell it works fine. UTF8在系统上正确配置,如果我在shell上运行命令,它可以正常工作。

Here is the TCL code: exec /bin/cp $tmp_filename $dest_path 这是TCL代码: exec /bin/cp $tmp_filename $dest_path

How can I make it work? 我怎样才能使它工作?

The core of the problem is what encoding is being used to communicate with the operating system. 问题的核心是使用什么编码与操作系统进行通信。 For exec and filenames, that encoding is whatever is returned by the encoding system command (Tcl has a pretty good guess at what the correct value for that is when the Tcl library starts up, but very occasionally gets it wrong). 对于exec和文件名,该编码是encoding system命令返回的任何内容(Tcl对Tcl库启动时的正确值有很好的猜测,但偶尔会出错)。 On my computer, that command returns utf-8 which says (correctly!) that strings passed to (and received from) the OS are UTF-8. 在我的计算机上,该命令返回utf-8 ,其中说(正确!)传递给OS(并从OS接收)的字符串是UTF-8。

You should be able to use the file copy command instead of doing exec /bin/cp , which will be helpful here as that's got less layers of trickiness (it avoids going through an external program which can impose its own problems). 您应该能够使用file copy命令而不是执行exec /bin/cp ,这将在这里有所帮助,因为它具有较少的技巧层次(它避免了通过可以强加其自身问题的外部程序)。 We'll assume that that's being done: 我们假设这样做:

set tmp_filename "foobar.txt";  # <<< fill in the right value, of course
set dest_path "/path/to/dir/mémo.txt"
file copy $tmp_filename $dest_path

If that fails, we need to work out why. 如果失败了,我们需要找出原因。 The most likely problems relate to the encoding though, and can go wrong in multiple ways that interact horribly. 最可能的问题与编码有关,并且可能以多种方式出错,这些方式可怕地相互作用。 Alas, the details matter. 唉,细节很重要。 In particular, the encoding for a path depends on the actual filesystem (it's formally a parameter when the filesystem is created) and can vary on Unix between parts of a path when you have a mount within another mount. 特别是,路径的编码取决于实际的文件系统(它在创建文件系统时正式是一个参数),并且当您在另一个安装中安装时,在Unix的路径部分之间可能会有所不同。

If the worst comes to the worst, you can put Tcl into ISO 8859-1 mode and then do all the encoding yourself (as ISO 8859-1 is the “just use the bytes I tell you” encoding); 如果最坏的情况发生,你可以将Tcl置于ISO 8859-1模式,然后自己完成所有编码(因为ISO 8859-1是“只使用我告诉你的字节”编码); encoding convertto is also useful in this case. encoding convertto在这种情况下也很有用。 Be aware that this can generate filenames that cause trouble for other programs, but it's at least able to let you get at it. 请注意,这可能会生成导致其他程序出现问题的文件名,但它至少可以让您获得它。

encoding system iso98859-1
file copy $tmp_filename [encoding convertto utf-8 $dest_path]

Care might be needed to convert different parts of the path correctly in this case: you're taking full responsibility for what's going on. 在这种情况下,可能需要小心正确地转换路径的不同部分:您要对正在发生的事情负全部责任。


If you're on Windows, please just let Tcl handle the details. 如果你在Windows上,请让Tcl处理细节。 Tcl uses the Wide (Unicode) Windows API directly so you can pretend that none of these problems exist. Tcl直接使用Wide(Unicode)Windows API,因此您可以假装不存在这些问题。 (There are other problems instead.) (还有其他问题。)

On macOS, please leave encoding system alone as it is correct. 在macOS上,请保留encoding system ,因为它正确的。 Macs have a very opinionated approach to encodings. Mac对编码有一种非常自以为是的方法。

I already tried the file copy command but it says error copying "/tmp/file7k5kqg" to "/path/to/dir/mémo.txt": no such file or directory 我已经尝试过文件复制命令,但它说错误复制“/ tmp / file7k5kqg”到“/ path / to / dir /mémo.txt”:没有这样的文件或目录

My reading of your problem is that, for some reason, your Tcl is set to iso8859-1 ( [encoding system] ), while the executing environment (shell) is set to utf-8 . 我对你的问题的解读是,由于某种原因,你的Tcl设置为iso8859-1[encoding system] ),而执行环境(shell)设置为utf-8 This explains why Donal's suggestion works for you: 这解释了为什么Donal的建议适合你:

encoding system iso8859-1
file copy $tmp_filename [encoding convertto utf-8 $dest_path]

This will safely pass utf-8 encoded bytearray down to any syscall: é or \\xc3\\xa9 or . 这将安全地将utf-8编码的bytearray传递给任何系统调用: é\\xc3\\xa9 Watch: 看:

% binary encode hex [encoding convertto utf-8 é] 
c3a9
% encoding system iso8859-1; exec xxd << [encoding convertto utf-8 é] 
00000000: c3a9                                     ..

This is equivalent to [encoding system] also being set to utf-8 (as to be expected in an otherwise utf-8 environment): 这相当于[encoding system]也设置为utf-8 (在utf-8环境中预期):

% encoding system
utf-8
% exec xxd << é
00000000: c3a9                                     ..

What you are experiencing (without any intervention) seems to be a re-coding of the Tcl internal encoding to iso8859-1 on the way out from Tcl (because of [encoding system] , as Donal describes), and a follow-up (and faulty) re-coding of this iso8859-1 value into the utf-8 environment. 您正在经历的(没有任何干预)似乎是在从Tcl出来的途中将Tcl内部编码重新编码为iso8859-1 (因为[encoding system] ,正如Donal所描述的),以及后续行动(并且将这个iso8859-1值重新编码到utf-8环境中。

Watch the difference ( \\xe9 vs. \\xc3\\xa9 ): 观察差异( \\xe9\\xc3\\xa9 ):

% encoding system iso8859-1
% encoding system
iso8859-1
%  exec xxd << é
00000000: e9

The problem it then seems is that \\xe9 is to be interpreted in your otherwise utf-8 env, like: 那么问题似乎是\\xe9将在你的utf-8 env中解释,如:

$ locale
LANG="de_AT.UTF-8"
...
$ echo -ne '\xe9'
?
$ touch `echo -ne 'm\xe9mo.txt'`
touch: m?mo.txt: Illegal byte sequence
$ touch mémo.txt
$ ls mémo.txt 
mémo.txt
$ cp `echo -ne 'm\xe9mo.txt'` b.txt
cp: m?mo.txt: No such file or directory

But: 但:

$ cp `echo -ne 'm\xc3\xa9mo.txt'` b.txt
$ ls b.txt
b.txt

Your options: 你的选择:

(1) You need to find out why Tcl picks up iso8859-1 , to begin with. (1)你需要找出为什么Tcl会选择iso8859-1 How did you obtain your installation? 你是如何获得安装的? Self-compiled? 自编? What are the details (version)? 有什么细节(版本)?

(2) You may proceed as Donal suggests, or alternatively, set encoding system utf-8 explicitly. (2)你可以像Donal建议的那样继续,或者明确地设置encoding system utf-8

encoding system utf-8
file copy $tmp_filename $dest_path

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM