简体   繁体   中英

How to execute a command with multi-byte characters using system function in C++

I am trying to do something like below:

string command = "executable.exe .\\テストプログラム\\filename.ext";
int retval = system(command.c_str());

Upon debugging, I have discovered that the multi-byte characters are not recognized and were represented in random characters.

I have also tried storing the command in a batch file first and then executing the batch file.

filesystem::path batFile = filesystem::path(".\batFile.bat");
string command = "executable.exe .\\テストプログラム\\filename.ext";
writeBatCmd(batFile, command);
int retval = system(batFile.string().c_str());

My findings were that the multi-byte characters were stored correctly in the .bat file but on execution, the same as above still occurs.

Executing the created .bat file in cmd runs the command correctly.

Using CreateProcess function instead of system function does not change the behavior.

My initial guess was that the need to convert the string to c_str was what caused the behavior but writing the command in a .bat file and then executing the .bat disproved it.

Thanks in advance for the help!

EDIT:

Tried solutions:
Solution 1 Setting locale to utf8 then calling the program directly. The command to execute the program is stored in a wstring object. When the multi-byte characters are hardcoded in the wstring object, there is no problem. Example:

wstring cmd = L"executable.exe .\\テストプログラム\\filename.ext";

When something like this is executed, the characters starting from the multi-byte characters up to the end of the string are truncated:

wstring cmd = L"executable.exe " + pathToFile + L"\\filename.ext";
// cmd value: "executable.exe .\"

Solution 2
I have also tried using u16string object, when this is used, the command is stored correctly. The problem with this is that I cannot call system function on it since it is u16string, are there any system function that can be used for u16string? or is there a way of converting u16string to wstring without possibility of changing multi-byte characters?

u16string cmd = u"executable.exe .\\テストプログラム\\filename.ext";
// cmd value: executable.exe .\テストプログラム\filename.ext

Solution 3
I tried setting locale to utf8, then storing command in a .bat file and then executing the .bat file. Upon execution, the command is stored correctly in the .bat file. On call of the .bat file, the multi-byte characters are not recognized/displayed as single byte characters.

setlocale(LC_ALL, "en_US.utf8");
filesystem::path batFile = filesystem::path(".\batFile.bat");
u16string cmd = u"executable.exe .\\テストプログラム\\filename.ext";
// cmd value: executable.exe .\テストプログラム\filename.ext
writeAsBat(batFile , cmd);
// batfile content: 
//executable.exe .\テストプログラム\filename.ext
//EXIT /B %ERRORLEVEL%
int retval = system(batFile.string().c_str());
/*
Output: 
in .bat file: executable.exe .\テストプログラム\filename.ext
on execution of .bat file: executable.exe .\チE¹トゅログラム\filename.ext
*/

Windows internally uses UTF-16 for all system functions.

If you call the MBCS/ANSI functions as you are doing, arguments are first converted to UTF-16 using the current codepage, then interpreted and executed.

If your current codepage is set correctly - and UTF-8 is not a valid codepage - then this should work. You probably need codepage 932.

However, you should really call the wide-character functions for all purposes on Windows.

Activating my psychic debugging powers, I will guess that your C++ file is in UTF-8.

Updated Since April 2018 you can now set UTF-8 as the current character set in C. https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale?view=msvc-160#utf-8-support

Unpacking a little more

What is probably happening is when you compile, your C string is being converted to a sequence of bytes, probably in UTF-8 encoding. These bytes are then being written to the batch file. But batch files cannot be written in UTF-8 , they can be written in the current codepage (whatever that is, in your case probably Japanese codepage 932).

Solving your problem

It looks like you want to write a batch file because you are having difficulty calling your program, and have reached for a batch file as a solution.

If that's the case you may have better luck setting the C locale to UTF-8, and calling the program directly, or using the wide-character APIs to do so.

https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale?view=msvc-160#utf-8-support

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM