简体   繁体   English

如何使用C++中的系统函数执行带有多字节字符的命令

[英]How to execute a command with multi-byte characters using system function in C++

I am trying to do something like below:我正在尝试执行以下操作:

string command = "executable.exe .\\テストプログラム\\filename.ext";
int retval = system(command.c_str());

Upon debugging, I have discovered that the multi-byte characters are not recognized and were represented in random characters.在调试时,我发现多字节字符无法识别并且以随机字符表示。

I have also tried storing the command in a batch file first and then executing the batch file.我还尝试先将命令存储在批处理文件中,然后再执行批处理文件。

filesystem::path batFile = filesystem::path(".\batFile.bat");
string command = "executable.exe .\\テストプログラム\\filename.ext";
writeBatCmd(batFile, command);
int retval = system(batFile.string().c_str());

My findings were that the multi-byte characters were stored correctly in the .bat file but on execution, the same as above still occurs.我的发现是多字节字符正确存储在 .bat 文件中,但在执行时,仍会出现与上述相同的情况。

Executing the created .bat file in cmd runs the command correctly.在 cmd 中执行创建的 .bat 文件可以正确运行命令。

Using CreateProcess function instead of system function does not change the behavior.使用 CreateProcess 函数而不是系统函数不会改变行为。

My initial guess was that the need to convert the string to c_str was what caused the behavior but writing the command in a .bat file and then executing the .bat disproved it.我最初的猜测是需要将字符串转换为 c_str 是导致该行为的原因,但将命令写入 .bat 文件然后执行 .bat 反驳了它。

Thanks in advance for the help!在此先感谢您的帮助!

EDIT:编辑:

Tried solutions:尝试的解决方案:
Solution 1 Setting locale to utf8 then calling the program directly.解决方案1将locale 设置为utf8,然后直接调用程序。 The command to execute the program is stored in a wstring object.执行程序的命令存储在 wstring 对象中。 When the multi-byte characters are hardcoded in the wstring object, there is no problem.当多字节字符硬编码在 wstring 对象中时,没有问题。 Example:例子:

wstring cmd = L"executable.exe .\\テストプログラム\\filename.ext";

When something like this is executed, the characters starting from the multi-byte characters up to the end of the string are truncated:执行这样的操作时,从多字节字符开始到字符串末尾的字符将被截断:

wstring cmd = L"executable.exe " + pathToFile + L"\\filename.ext";
// cmd value: "executable.exe .\"

Solution 2解决方案2
I have also tried using u16string object, when this is used, the command is stored correctly.我也尝试过使用 u16string 对象,当使用它时,命令被正确存储。 The problem with this is that I cannot call system function on it since it is u16string, are there any system function that can be used for u16string?问题是我不能在它上面调用系统函数,因为它是 u16string,有没有可以用于 u16string 的系统函数? or is there a way of converting u16string to wstring without possibility of changing multi-byte characters?或者有没有办法将 u16string 转换为 wstring 而不可能更改多字节字符?

u16string cmd = u"executable.exe .\\テストプログラム\\filename.ext";
// cmd value: executable.exe .\テストプログラム\filename.ext

Solution 3解决方案3
I tried setting locale to utf8, then storing command in a .bat file and then executing the .bat file.我尝试将语言环境设置为 utf8,然后将命令存储在 .bat 文件中,然后执行 .bat 文件。 Upon execution, the command is stored correctly in the .bat file.执行后,该命令会正确存储在 .bat 文件中。 On call of the .bat file, the multi-byte characters are not recognized/displayed as single byte characters.在调用 .bat 文件时,多字节字符不被识别/显示为单字节字符。

setlocale(LC_ALL, "en_US.utf8");
filesystem::path batFile = filesystem::path(".\batFile.bat");
u16string cmd = u"executable.exe .\\テストプログラム\\filename.ext";
// cmd value: executable.exe .\テストプログラム\filename.ext
writeAsBat(batFile , cmd);
// batfile content: 
//executable.exe .\テストプログラム\filename.ext
//EXIT /B %ERRORLEVEL%
int retval = system(batFile.string().c_str());
/*
Output: 
in .bat file: executable.exe .\テストプログラム\filename.ext
on execution of .bat file: executable.exe .\チE¹トゅログラム\filename.ext
*/

Windows internally uses UTF-16 for all system functions. Windows 在内部对所有系统功能使用 UTF-16。

If you call the MBCS/ANSI functions as you are doing, arguments are first converted to UTF-16 using the current codepage, then interpreted and executed.如果您在调用 MBCS/ANSI 函数时,首先使用当前代码页将参数转换为 UTF-16,然后进行解释和执行。

If your current codepage is set correctly - and UTF-8 is not a valid codepage - then this should work.如果您当前的代码页设置正确 - 并且 UTF-8 不是有效的代码页 - 那么这应该可以工作。 You probably need codepage 932.您可能需要代码页 932。

However, you should really call the wide-character functions for all purposes on Windows.但是,您真的应该在 Windows 上为所有目的调用宽字符函数。

Activating my psychic debugging powers, I will guess that your C++ file is in UTF-8.激活我的通灵调试能力,我猜你的 C++ 文件是 UTF-8 格式的。

Updated Since April 2018 you can now set UTF-8 as the current character set in C. https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale?view=msvc-160#utf-8-support自 2018 年 4 月更新,您现在可以将 UTF-8 设置为 C 中的当前字符集。 https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale?view=msvc -160#utf-8-支持

Unpacking a little more开箱多一点

What is probably happening is when you compile, your C string is being converted to a sequence of bytes, probably in UTF-8 encoding.可能发生的情况是,当您编译时,您的 C 字符串被转换为字节序列,可能是 UTF-8 编码。 These bytes are then being written to the batch file.然后将这些字节写入批处理文件。 But batch files cannot be written in UTF-8 , they can be written in the current codepage (whatever that is, in your case probably Japanese codepage 932).但是批处理文件不能用 UTF-8 编写,它们可以用当前代码页编写(无论如何,在您的情况下可能是日语代码页 932)。

Solving your problem解决您的问题

It looks like you want to write a batch file because you are having difficulty calling your program, and have reached for a batch file as a solution.看起来您想编写一个批处理文件,因为您在调用程序时遇到困难,并且已经找到了一个批处理文件作为解决方案。

If that's the case you may have better luck setting the C locale to UTF-8, and calling the program directly, or using the wide-character APIs to do so.如果是这种情况,您可能会更幸运地将 C 语言环境设置为 UTF-8,并直接调用程序,或者使用宽字符 API 来执行此操作。

https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale?view=msvc-160#utf-8-support https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale?view=msvc-160#utf-8-support

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM