简体   繁体   English

C ++和NTFS文件:路径名VS开头

[英]C++ and NTFS file: pathname VS opening

This is an extension of this question: fstream not opening files with accent marks in pathname 这是此问题的扩展: fstream不会打开路径名带有重音符号的文件

The problem is the following: a program opening a simple NTFS text file with accent marks in pathname (eg à , ò , ...). 问题如下:一个程序打开一个简单的NTFS文本文件,该文件的路径名带有重音符号 (例如àò ,...)。 In my tests I'm using a file with pathname I:\\università\\foo.txt ( università is the Italian translation of university ) 在我的测试中我使用的是路径文件I:\\ UNIVERSITA \\ foo.txt的 (UNIVERSITA是意大利大学的翻译)

The following is the test program: 以下是测试程序:

#include <iostream>
#include <fstream>
#include <string>
#include <cstdio>
#include <errno.h>
#include <Windows.h>

using namespace std;

LPSTR cPath = "I:/università/foo.txt";
LPWSTR widecPath = L"I:/università/foo.txt";
string path("I:/università/foo.txt");

void tryWithStandardC();
void tryWithStandardCpp();
void tryWithWin32();

int main(int argc, char **argv) {
    tryWithStandardC();
    tryWithStandardCpp();
    tryWithWin32();

    return 0;
} 

void tryWithStandardC() {
    FILE *stream = fopen(cPath, "r");

    if (stream) {
        cout << "File opened with fopen!" << endl;
        fclose(stream);
    }

    else {
        cout << "fopen() failed: " << strerror(errno) << endl;
    }
}

void tryWithStandardCpp() {
    ifstream s;
    s.exceptions(ifstream::failbit | ifstream::badbit | ifstream::eofbit);      

    try {
        s.open(path.c_str(), ifstream::in);
        cout << "File opened with c++ open()" << endl;
        s.close();
    }

    catch (ifstream::failure f) {
        cout << "Exception " << f.what() << endl;
    }   
}

void tryWithWin32() {

    DWORD error;
    HANDLE h = CreateFile(cPath, GENERIC_READ, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);

    if (h == INVALID_HANDLE_VALUE) {
        error = GetLastError();
        cout << "CreateFile failed: error number " << error << endl;
    }

    else {
        cout << "File opened with CreateFile!" << endl;
        CloseHandle(h);
        return;
    }

    HANDLE wideHandle = CreateFileW(widecPath, GENERIC_READ, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);

    if (wideHandle == INVALID_HANDLE_VALUE) {
        error = GetLastError();
        cout << "CreateFileW failed: error number " << error << endl;
    }

    else {
        cout << "File opened with CreateFileW!" << endl;
        CloseHandle(wideHandle);
    }
}

The source file is saved with UTF-8 encoding. 源文件以UTF-8编码保存。 I'm using Windows 8. 我正在使用Windows 8。

This is the output of the program compiled with VC++ (Visual Studio 2012) 这是使用VC ++编译的程序的输出(Visual Studio 2012)

fopen() failed: No such file or directory
Exception ios_base::failbit set
CreateFile failed: error number 3
CreateFileW failed: error number 3

This is the output using MinGW g++ 这是使用MinGW g ++的输出

fopen() failed: No such file or directory
Exception basic_ios::clear
CreateFile failed: error number 3
File opened with CreateFileW!

So let's go to the questions: 因此,让我们来看以下问题:

  1. Why fopen() and std::ifstream works in a similar test in Linux but they don't in Windows? 为什么fopen()和std :: ifstream在Linux中可以进行类似的测试,但在Windows中却不能?
  2. Why CreateFileW() works only compiling with g++? 为什么CreateFileW()仅可用于g ++编译?
  3. Does exist a cross-platform alternative to CreateFile? 是否存在跨平台替代CreateFile的替代方案?

I hope that opening a generic file with a generic pathname could be done without the necessity of platform-specific code, but I have not idea how to do it. 我希望可以使用通用路径名打开通用文件,而无需使用特定于平台的代码,但是我不知道该怎么做。

Thanks in advance. 提前致谢。

You write: 你写:

“The source file is saved with UTF-8 encoding.” “源文件以UTF-8编码保存。”

Well that's all well and good (so far) if you're using the g++ compiler, which has UTF-8 as its default basic source character set. 好吧(到目前为止),如果您使用的是g ++编译器,该编译器以UTF-8作为默认的基本源字符集。 However, Visual C++ will by default assume that the source file is encoded in Windows ANSI, unless it's clearly otherwise. 但是,Visual C ++默认情况下将假定源文件是使用Windows ANSI编码的,除非另有明确说明。 So make very sure that it has a BOM (Byte Order Mark) at the start, which – undocumented, as far as I know – causes Visual C++ to treat it as encoded with UTF-8. 因此,请确保在开始时具有BOM(字节顺序标记),据我所知,BOM(字节记录)尚未公开,这会导致Visual C ++将其视为使用UTF-8编码。

You then ask, 然后你问,

“1. “1。 Why fopen() and std::ifstream works in a similar test in Linux but they don't in Windows?” 为什么fopen()和std :: ifstream在Linux中进行类似的测试却在Windows中却没有?”

For Linux it's likely to work because (1) modern Linux is UTF-8 oriented, so if the filename looks the same it is likely the same as the identical looking UTF-8 encoded filename in the source code, and (2) in *nix a filename is just a sequence of bytes, not a sequence of characters. 对于Linux来说,它可能会起作用,因为(1)现代Linux是面向UTF-8的,因此,如果文件名看起来相同,则很可能与源代码中看起来相同的UTF-8编码文件名相同,并且(2)在*中nix文件名只是一个字节序列,而不是字符序列。 Which means that regardless of how it looks, if you pass the identical sequence of bytes, the same values, then you have a match, otherwise not. 这意味着无论外观如何,如果传递相同的字节序列,相同的值,则表示匹配,否则不匹配。

In contrast, in Windows a filename is a sequence of characters that can be encoded in various ways. 相反,在Windows中,文件名是可以用各种方式编码的字符序列。

In your case the UTF-8 encoded filename in the source code is stored as Windows ANSI in the executable (and yes, the result of building with Visual C++ depends on the selected ANSI codepage in Windows, which also as far as I know is undocumented). 在您的情况下,源代码中UTF-8编码的文件名以Windows ANSI的形式存储在可执行文件中(是​​的,使用Visual C ++进行编译的结果取决于Windows中所选的ANSI代码页,据我所知,这也是未记录的)。 Then this gobbledegook string is passed down a routine hierarchy and converted to UTF-16, which is the standard character encoding in Windows. 然后,将此gobbledegook字符串向下传递到例程层次结构中,并转换为UTF-16,这是Windows中的标准字符编码。 The result doesn't match the filename at all. 结果根本与文件名不匹配。


You further ask, 您进一步问,

“2. “2。 Why CreateFileW() works only compiling with g++?” 为什么CreateFileW()仅能与g ++一起编译?”

Presumably because you did not include a BOM at the start of the sourc code file (see above). 大概是因为在源代码文件的开头没有包含BOM(请参见上文)。

With a BOM everything works nicely with Visual C++, at least in Windows 7: 使用BOM,至少在Windows 7中,一切都可以与Visual C ++很好地配合使用:

File opened with fopen!
File opened with c++ open()
File opened with CreateFile!

Finally, you ask, 最后,你问,

“3. “3。 Does exist a cross-platform alternative to CreateFile?” 是否存在跨平台替代CreateFile的选择?”

Not really. 并不是的。 There is Boost filesystem. 有Boost文件系统。 But while its version 2 did have a workaround for the standard library's lossy narrow character based encoding, that workaround was removed in version 3, which just uses a Visual C++ extension of the standard library where Visual C++ implementation provides wide character argument versions of the stream constructors and open . 但是,尽管其版本2确实针对标准库的基于有损窄字符的编码提供了一种解决方法,但该解决方法在版本3中已删除,该版本仅使用标准库的Visual C ++ 扩展,其中Visual C ++实现提供了流的宽字符参数版本构造函数并open Ie, at least as far as I know (I haven't checked lately if things have been fixed), Boost filesystem only works in general with Visual C++, not with eg g++ – although it works for no-troublesome-characters filenames. 即,至少就我所知(最近我还没有检查是否已解决问题),Boost文件系统通常只适用于Visual C ++,而不适用于g ++,尽管它适用于无麻烦字符的文件名。

The workaround that v2 had, was to try with conversion to Windows ANSI (the codepage specified by the GetACP function), and if that didn't work, try GetShortPathName , which is practically guaranteed to be representable with Windows ANSI. v2的解决方法是尝试转换为Windows ANSI(由GetACP函数指定的代码页),如果该方法不起作用,请尝试GetShortPathName ,实际上可以保证可以用Windows ANSI表示。

Part of the reason that the workaround in Boost filesystem was removed was, as I understand it, that it's in principle possible for the user to turn off the Windows short name functionality at least in Windows Vista and earlier. 据我了解,删除Boost文件系统中的变通办法的部分原因是,从原则上讲,用户至少在Windows Vista和更早版本中可以关闭Windows简称功能。 However that's not a practical concern. 但是,这不是实际问题。 It just means that there is an easy fix available (namely turn it back on) if the user experiences problems due to having wilfully lobotomized the system. 这只是意味着,如果用户由于故意破坏了系统而遇到问题,则可以使用一个简单的修复程序(即将其重新打开)。

The problem you're stumbling over is that the encoding you pass to fstreams as path is implementation-specific. 您遇到的问题是,当path特定于实现时,传递给fstreams的编码。 Further, the behaviour of your program is implementation-defined because it uses characters outside of C++'s characterset in the code, ie the accented characters. 此外,程序的行为是实现定义的,因为它使用代码中C ++字符集之外的字符,即重音字符。 The problem there is that there are many different encodings that can be used to represent these characters. 那里的问题是,有许多不同的编码可用于表示这些字符。

Now, there are solutions: 现在,有解决方案:

  • Firstly, there is an MSC extension to tell the compiler which encoding it should assume. 首先,有一个MSC扩展来告诉编译器应该采用哪种编码。
  • In order to get a path working with CreateFileW(), you can code the path like wchar_t const path[] = {'f', 0x20ac, '.', 't', 'x', 't'}; 为了获得使用CreateFileW()的路径,可以对路径进行编码,例如wchar_t const path[] = {'f', 0x20ac, '.', 't', 'x', 't'}; . This is not really comfortable, but in practice the paths are stored in files with some Unicode encoding or input by the user. 这并不是很舒服,但实际上,路径是使用某些Unicode编码或用户输入存储在文件中的。
  • Then, there is an extension in the implementation of the standard library that allows you to use wchar_t paths, there are both _wfopen() and fstream constructors. 然后,在标准库的实现中进行了扩展,允许您使用wchar_t路径,同时具有_wfopen()和fstream构造函数。
  • Then, there is Boost, which has a filesystem and iostream library that is specifically made to provide portable. 然后是Boost,它具有专门用于提供可移植性的文件系统和iostream库。 I would definitely look at this. 我一定会看这个。

Note that while the wchar_t paths are not portable, porting them to a new platform is usually not very complicated. 请注意,尽管wchar_t路径不可移植,但将它们移植到新平台通常不是很复杂。 A few #ifdefs and you're set. 几个#ifdefs就可以了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM