[英]C++ and NTFS file: pathname VS opening
This is an extension of this question: fstream not opening files with accent marks in pathname 这是此问题的扩展: fstream不会打开路径名带有重音符号的文件
The problem is the following: a program opening a simple NTFS text file with accent marks in pathname (eg à , ò , ...). 问题如下:一个程序打开一个简单的NTFS文本文件,该文件的路径名带有重音符号 (例如à , ò ,...)。 In my tests I'm using a file with pathname I:\\università\\foo.txt ( università is the Italian translation of university )
在我的测试中我使用的是路径文件I:\\ UNIVERSITA \\ foo.txt的 (UNIVERSITA是意大利大学的翻译)
The following is the test program: 以下是测试程序:
#include <iostream>
#include <fstream>
#include <string>
#include <cstdio>
#include <errno.h>
#include <Windows.h>
using namespace std;
LPSTR cPath = "I:/università/foo.txt";
LPWSTR widecPath = L"I:/università/foo.txt";
string path("I:/università/foo.txt");
void tryWithStandardC();
void tryWithStandardCpp();
void tryWithWin32();
int main(int argc, char **argv) {
tryWithStandardC();
tryWithStandardCpp();
tryWithWin32();
return 0;
}
void tryWithStandardC() {
FILE *stream = fopen(cPath, "r");
if (stream) {
cout << "File opened with fopen!" << endl;
fclose(stream);
}
else {
cout << "fopen() failed: " << strerror(errno) << endl;
}
}
void tryWithStandardCpp() {
ifstream s;
s.exceptions(ifstream::failbit | ifstream::badbit | ifstream::eofbit);
try {
s.open(path.c_str(), ifstream::in);
cout << "File opened with c++ open()" << endl;
s.close();
}
catch (ifstream::failure f) {
cout << "Exception " << f.what() << endl;
}
}
void tryWithWin32() {
DWORD error;
HANDLE h = CreateFile(cPath, GENERIC_READ, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
if (h == INVALID_HANDLE_VALUE) {
error = GetLastError();
cout << "CreateFile failed: error number " << error << endl;
}
else {
cout << "File opened with CreateFile!" << endl;
CloseHandle(h);
return;
}
HANDLE wideHandle = CreateFileW(widecPath, GENERIC_READ, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
if (wideHandle == INVALID_HANDLE_VALUE) {
error = GetLastError();
cout << "CreateFileW failed: error number " << error << endl;
}
else {
cout << "File opened with CreateFileW!" << endl;
CloseHandle(wideHandle);
}
}
The source file is saved with UTF-8 encoding. 源文件以UTF-8编码保存。 I'm using Windows 8.
我正在使用Windows 8。
This is the output of the program compiled with VC++ (Visual Studio 2012) 这是使用VC ++编译的程序的输出(Visual Studio 2012)
fopen() failed: No such file or directory
Exception ios_base::failbit set
CreateFile failed: error number 3
CreateFileW failed: error number 3
This is the output using MinGW g++ 这是使用MinGW g ++的输出
fopen() failed: No such file or directory
Exception basic_ios::clear
CreateFile failed: error number 3
File opened with CreateFileW!
So let's go to the questions: 因此,让我们来看以下问题:
I hope that opening a generic file with a generic pathname could be done without the necessity of platform-specific code, but I have not idea how to do it. 我希望可以使用通用路径名打开通用文件,而无需使用特定于平台的代码,但是我不知道该怎么做。
Thanks in advance. 提前致谢。
You write: 你写:
“The source file is saved with UTF-8 encoding.”
“源文件以UTF-8编码保存。”
Well that's all well and good (so far) if you're using the g++ compiler, which has UTF-8 as its default basic source character set. 好吧(到目前为止),如果您使用的是g ++编译器,该编译器以UTF-8作为默认的基本源字符集。 However, Visual C++ will by default assume that the source file is encoded in Windows ANSI, unless it's clearly otherwise.
但是,Visual C ++默认情况下将假定源文件是使用Windows ANSI编码的,除非另有明确说明。 So make very sure that it has a BOM (Byte Order Mark) at the start, which – undocumented, as far as I know – causes Visual C++ to treat it as encoded with UTF-8.
因此,请确保在开始时具有BOM(字节顺序标记),据我所知,BOM(字节记录)尚未公开,这会导致Visual C ++将其视为使用UTF-8编码。
You then ask, 然后你问,
“1.
“1。 Why fopen() and std::ifstream works in a similar test in Linux but they don't in Windows?”
为什么fopen()和std :: ifstream在Linux中进行类似的测试却在Windows中却没有?”
For Linux it's likely to work because (1) modern Linux is UTF-8 oriented, so if the filename looks the same it is likely the same as the identical looking UTF-8 encoded filename in the source code, and (2) in *nix a filename is just a sequence of bytes, not a sequence of characters. 对于Linux来说,它可能会起作用,因为(1)现代Linux是面向UTF-8的,因此,如果文件名看起来相同,则很可能与源代码中看起来相同的UTF-8编码文件名相同,并且(2)在*中nix文件名只是一个字节序列,而不是字符序列。 Which means that regardless of how it looks, if you pass the identical sequence of bytes, the same values, then you have a match, otherwise not.
这意味着无论外观如何,如果传递相同的字节序列,相同的值,则表示匹配,否则不匹配。
In contrast, in Windows a filename is a sequence of characters that can be encoded in various ways. 相反,在Windows中,文件名是可以用各种方式编码的字符序列。
In your case the UTF-8 encoded filename in the source code is stored as Windows ANSI in the executable (and yes, the result of building with Visual C++ depends on the selected ANSI codepage in Windows, which also as far as I know is undocumented). 在您的情况下,源代码中UTF-8编码的文件名以Windows ANSI的形式存储在可执行文件中(是的,使用Visual C ++进行编译的结果取决于Windows中所选的ANSI代码页,据我所知,这也是未记录的)。 Then this gobbledegook string is passed down a routine hierarchy and converted to UTF-16, which is the standard character encoding in Windows.
然后,将此gobbledegook字符串向下传递到例程层次结构中,并转换为UTF-16,这是Windows中的标准字符编码。 The result doesn't match the filename at all.
结果根本与文件名不匹配。
You further ask, 您进一步问,
“2.
“2。 Why CreateFileW() works only compiling with g++?”
为什么CreateFileW()仅能与g ++一起编译?”
Presumably because you did not include a BOM at the start of the sourc code file (see above). 大概是因为在源代码文件的开头没有包含BOM(请参见上文)。
With a BOM everything works nicely with Visual C++, at least in Windows 7: 使用BOM,至少在Windows 7中,一切都可以与Visual C ++很好地配合使用:
File opened with fopen! File opened with c++ open() File opened with CreateFile!
Finally, you ask, 最后,你问,
“3.
“3。 Does exist a cross-platform alternative to CreateFile?”
是否存在跨平台替代CreateFile的选择?”
Not really. 并不是的。 There is Boost filesystem.
有Boost文件系统。 But while its version 2 did have a workaround for the standard library's lossy narrow character based encoding, that workaround was removed in version 3, which just uses a Visual C++ extension of the standard library where Visual C++ implementation provides wide character argument versions of the stream constructors and
open
. 但是,尽管其版本2确实针对标准库的基于有损窄字符的编码提供了一种解决方法,但该解决方法在版本3中已删除,该版本仅使用标准库的Visual C ++ 扩展,其中Visual C ++实现提供了流的宽字符参数版本构造函数并
open
。 Ie, at least as far as I know (I haven't checked lately if things have been fixed), Boost filesystem only works in general with Visual C++, not with eg g++ – although it works for no-troublesome-characters filenames. 即,至少就我所知(最近我还没有检查是否已解决问题),Boost文件系统通常只适用于Visual C ++,而不适用于g ++,尽管它适用于无麻烦字符的文件名。
The workaround that v2 had, was to try with conversion to Windows ANSI (the codepage specified by the GetACP
function), and if that didn't work, try GetShortPathName
, which is practically guaranteed to be representable with Windows ANSI. v2的解决方法是尝试转换为Windows ANSI(由
GetACP
函数指定的代码页),如果该方法不起作用,请尝试GetShortPathName
,实际上可以保证可以用Windows ANSI表示。
Part of the reason that the workaround in Boost filesystem was removed was, as I understand it, that it's in principle possible for the user to turn off the Windows short name functionality at least in Windows Vista and earlier. 据我了解,删除Boost文件系统中的变通办法的部分原因是,从原则上讲,用户至少在Windows Vista和更早版本中可以关闭Windows简称功能。 However that's not a practical concern.
但是,这不是实际问题。 It just means that there is an easy fix available (namely turn it back on) if the user experiences problems due to having wilfully lobotomized the system.
这只是意味着,如果用户由于故意破坏了系统而遇到问题,则可以使用一个简单的修复程序(即将其重新打开)。
The problem you're stumbling over is that the encoding you pass to fstreams as path is implementation-specific. 您遇到的问题是,当path特定于实现时,传递给fstreams的编码。 Further, the behaviour of your program is implementation-defined because it uses characters outside of C++'s characterset in the code, ie the accented characters.
此外,程序的行为是实现定义的,因为它使用代码中C ++字符集之外的字符,即重音字符。 The problem there is that there are many different encodings that can be used to represent these characters.
那里的问题是,有许多不同的编码可用于表示这些字符。
Now, there are solutions: 现在,有解决方案:
wchar_t const path[] = {'f', 0x20ac, '.', 't', 'x', 't'};
wchar_t const path[] = {'f', 0x20ac, '.', 't', 'x', 't'};
. Note that while the wchar_t paths are not portable, porting them to a new platform is usually not very complicated. 请注意,尽管wchar_t路径不可移植,但将它们移植到新平台通常不是很复杂。 A few #ifdefs and you're set.
几个#ifdefs就可以了。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.