简体   繁体   中英

C++ and NTFS file: pathname VS opening

This is an extension of this question: fstream not opening files with accent marks in pathname

The problem is the following: a program opening a simple NTFS text file with accent marks in pathname (eg à , ò , ...). In my tests I'm using a file with pathname I:\\università\\foo.txt ( università is the Italian translation of university )

The following is the test program:

#include <iostream>
#include <fstream>
#include <string>
#include <cstdio>
#include <errno.h>
#include <Windows.h>

using namespace std;

LPSTR cPath = "I:/università/foo.txt";
LPWSTR widecPath = L"I:/università/foo.txt";
string path("I:/università/foo.txt");

void tryWithStandardC();
void tryWithStandardCpp();
void tryWithWin32();

int main(int argc, char **argv) {
    tryWithStandardC();
    tryWithStandardCpp();
    tryWithWin32();

    return 0;
} 

void tryWithStandardC() {
    FILE *stream = fopen(cPath, "r");

    if (stream) {
        cout << "File opened with fopen!" << endl;
        fclose(stream);
    }

    else {
        cout << "fopen() failed: " << strerror(errno) << endl;
    }
}

void tryWithStandardCpp() {
    ifstream s;
    s.exceptions(ifstream::failbit | ifstream::badbit | ifstream::eofbit);      

    try {
        s.open(path.c_str(), ifstream::in);
        cout << "File opened with c++ open()" << endl;
        s.close();
    }

    catch (ifstream::failure f) {
        cout << "Exception " << f.what() << endl;
    }   
}

void tryWithWin32() {

    DWORD error;
    HANDLE h = CreateFile(cPath, GENERIC_READ, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);

    if (h == INVALID_HANDLE_VALUE) {
        error = GetLastError();
        cout << "CreateFile failed: error number " << error << endl;
    }

    else {
        cout << "File opened with CreateFile!" << endl;
        CloseHandle(h);
        return;
    }

    HANDLE wideHandle = CreateFileW(widecPath, GENERIC_READ, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);

    if (wideHandle == INVALID_HANDLE_VALUE) {
        error = GetLastError();
        cout << "CreateFileW failed: error number " << error << endl;
    }

    else {
        cout << "File opened with CreateFileW!" << endl;
        CloseHandle(wideHandle);
    }
}

The source file is saved with UTF-8 encoding. I'm using Windows 8.

This is the output of the program compiled with VC++ (Visual Studio 2012)

fopen() failed: No such file or directory
Exception ios_base::failbit set
CreateFile failed: error number 3
CreateFileW failed: error number 3

This is the output using MinGW g++

fopen() failed: No such file or directory
Exception basic_ios::clear
CreateFile failed: error number 3
File opened with CreateFileW!

So let's go to the questions:

  1. Why fopen() and std::ifstream works in a similar test in Linux but they don't in Windows?
  2. Why CreateFileW() works only compiling with g++?
  3. Does exist a cross-platform alternative to CreateFile?

I hope that opening a generic file with a generic pathname could be done without the necessity of platform-specific code, but I have not idea how to do it.

Thanks in advance.

You write:

“The source file is saved with UTF-8 encoding.”

Well that's all well and good (so far) if you're using the g++ compiler, which has UTF-8 as its default basic source character set. However, Visual C++ will by default assume that the source file is encoded in Windows ANSI, unless it's clearly otherwise. So make very sure that it has a BOM (Byte Order Mark) at the start, which – undocumented, as far as I know – causes Visual C++ to treat it as encoded with UTF-8.

You then ask,

“1. Why fopen() and std::ifstream works in a similar test in Linux but they don't in Windows?”

For Linux it's likely to work because (1) modern Linux is UTF-8 oriented, so if the filename looks the same it is likely the same as the identical looking UTF-8 encoded filename in the source code, and (2) in *nix a filename is just a sequence of bytes, not a sequence of characters. Which means that regardless of how it looks, if you pass the identical sequence of bytes, the same values, then you have a match, otherwise not.

In contrast, in Windows a filename is a sequence of characters that can be encoded in various ways.

In your case the UTF-8 encoded filename in the source code is stored as Windows ANSI in the executable (and yes, the result of building with Visual C++ depends on the selected ANSI codepage in Windows, which also as far as I know is undocumented). Then this gobbledegook string is passed down a routine hierarchy and converted to UTF-16, which is the standard character encoding in Windows. The result doesn't match the filename at all.


You further ask,

“2. Why CreateFileW() works only compiling with g++?”

Presumably because you did not include a BOM at the start of the sourc code file (see above).

With a BOM everything works nicely with Visual C++, at least in Windows 7:

File opened with fopen!
File opened with c++ open()
File opened with CreateFile!

Finally, you ask,

“3. Does exist a cross-platform alternative to CreateFile?”

Not really. There is Boost filesystem. But while its version 2 did have a workaround for the standard library's lossy narrow character based encoding, that workaround was removed in version 3, which just uses a Visual C++ extension of the standard library where Visual C++ implementation provides wide character argument versions of the stream constructors and open . Ie, at least as far as I know (I haven't checked lately if things have been fixed), Boost filesystem only works in general with Visual C++, not with eg g++ – although it works for no-troublesome-characters filenames.

The workaround that v2 had, was to try with conversion to Windows ANSI (the codepage specified by the GetACP function), and if that didn't work, try GetShortPathName , which is practically guaranteed to be representable with Windows ANSI.

Part of the reason that the workaround in Boost filesystem was removed was, as I understand it, that it's in principle possible for the user to turn off the Windows short name functionality at least in Windows Vista and earlier. However that's not a practical concern. It just means that there is an easy fix available (namely turn it back on) if the user experiences problems due to having wilfully lobotomized the system.

The problem you're stumbling over is that the encoding you pass to fstreams as path is implementation-specific. Further, the behaviour of your program is implementation-defined because it uses characters outside of C++'s characterset in the code, ie the accented characters. The problem there is that there are many different encodings that can be used to represent these characters.

Now, there are solutions:

  • Firstly, there is an MSC extension to tell the compiler which encoding it should assume.
  • In order to get a path working with CreateFileW(), you can code the path like wchar_t const path[] = {'f', 0x20ac, '.', 't', 'x', 't'}; . This is not really comfortable, but in practice the paths are stored in files with some Unicode encoding or input by the user.
  • Then, there is an extension in the implementation of the standard library that allows you to use wchar_t paths, there are both _wfopen() and fstream constructors.
  • Then, there is Boost, which has a filesystem and iostream library that is specifically made to provide portable. I would definitely look at this.

Note that while the wchar_t paths are not portable, porting them to a new platform is usually not very complicated. A few #ifdefs and you're set.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM