简体   繁体   English

如何在Windows上的C/C++中读取包含'œ'作为字符的文件名

[英]How to read a file name containing 'œ' as character in C/C++ on windows

This post is not a duplicate of this one: dirent not working with unicode这篇文章不是这个帖子的重复: dirent not working with unicode

Because here I'm using it on a different OS and I also don't want to do the same thing.因为在这里我在不同的操作系统上使用它,而且我也不想做同样的事情。 The other thread is trying to simply count the files, and I want to access the file name which is more complex.另一个线程试图简单地计算文件,我想访问更复杂的文件名。


I'm trying to retrieve data information through files names on a windows 10 OS.我正在尝试通过 Windows 10 操作系统上的文件名检索数据信息。

For this purpose I use dirent.h (external c library, but still very usefull also in c++).为此,我使用dirent.h (外部 c 库,但在 c++ 中仍然非常有用)。

DIR* directory = opendir(path);
struct dirent* direntStruct;

if (directory != NULL)
{
    while (direntStruct = readdir(directory))
    {            
        cout << direntStruct->d_name << endl;
    }
}

This code is able to retrieve all files names located in a specific folder (one by one).此代码能够检索位于特定文件夹中的所有文件名(一个一个)。 And it works pretty well!而且效果很好!

But when it encounter a file containing the character 'œ' then things are going crazy:但是当它遇到一个包含字符 'œ' 的文件时,事情就会变得疯狂:

Example:例子:

grosse blessure au cœur.txt

is read in my program as:在我的程序中读取为:

GUODU0~6.TXT

I'm not able to find the original data in the string name because as you can see my string variable has nothing to do with the current file name!我无法在字符串名称中找到原始数据,因为您可以看到我的字符串变量与当前文件名无关!

I can rename the file and it works, but I don't want to do this, I just need to read the data from that file name and it seems impossible.我可以重命名文件并且它可以工作,但我不想这样做,我只需要从该文件名中读取数据,这似乎是不可能的。 How can I do this?我怎样才能做到这一点?

在 Windows 上,您可以使用FindFirstFile()FindFirstFileEx()后跟FindNextFile()来读取返回文件名中带有 Unicode 的目录的内容。

Short File Name短文件名

The name you receive is the 8.3 short file name NTFS generates for non-ascii file names, so they can be accessed by programs that don't support unicode.您收到的名称是 NTFS 为非 ascii 文件名生成的8.3 短文件名,因此不支持 unicode 的程序可以访问它们。

clinging to dirent抱住dirent

If dirent doesn't support UTF-16, your best bet may be to change your library.如果 dirent 不支持 UTF-16,最好的办法可能是更改您的库。

However, depending on the implementation of the library you may have luck with:但是,根据库的实现,您可能会很幸运:

  • adding / changing the manifest of your application to support UTF-8 in char -based Windows API's.添加/更改应用程序的清单以支持基于char的 Windows API 中的 UTF-8。 This requires a very recent version of Windows 10.这需要最新版本的 Windows 10。
    see MSDN: Use the UTF-8 code page under Windows - Apps - UWP - Design and UI - Usability - Globalization and localization .请参阅 MSDN: 使用Windows - 应用程序 - UWP - 设计和 UI - 可用性 - 全球化和本地化的 UTF-8 代码页

  • setting the C++ Runtime's code page to UTF-8 using setlocale使用setlocale将 C++ 运行时的代码页设置为 UTF-8

I do not recommend this, and I don't know if this will work.我不推荐这个,我不知道这是否有效。

life is change生活就是改变

Use std::filesystem to enumerate directory content.使用std::filesystem枚举目录内容。 A simple example can be found here (see the "Update 2017"). 可以在此处找到一个简单示例(请参阅“2017 年更新”)。

Windows only仅限 Windows

You can use FindFirstFileW and FindNextFileW as platform API's that support UTF16 strings.您可以使用FindFirstFileWFindNextFileW作为支持 UTF16 字符串的平台 API。 However, with std::filesystem there's little reason to do so (at least for your use case).但是,使用 std::filesystem 几乎没有理由这样做(至少对于您的用例而言)。

If you're in C, use the OS functions directly, specifically FindFirstFileW and FindNextFileW .如果您使用 C,请直接使用操作系统函数,特别是FindFirstFileWFindNextFileW Note the W at the end, you want to use the wide versions of these functions to get back the full non-ASCII name.请注意末尾的W ,您希望使用这些函数的宽版本来获取完整的非 ASCII 名称。

In C++ you have more options, specifically with Boost.在 C++ 中,您有更多选择,尤其是 Boost。 You have classes like recursive_directory_iterator which allow cross-platform file searching, and they provide UTF-8/UTF-16 file names.你有像recursive_directory_iterator这样的类允许跨平台文件搜索,它们提供 UTF-8/UTF-16 文件名。

Edit: Just to be absolutely clear, the file name you get back from your original code is correct.编辑:为了绝对清楚,您从原始代码中获得的文件名是正确的。 Due to backwards compatibility in Windows filesystems (FAT32 and NTFS), every file has two names: the "full", Unicode aware name, and the "old" 8.3 name from DOS days.由于 Windows 文件系统(FAT32 和 NTFS)的向后兼容性,每个文件都有两个名称:“完整”、Unicode 识别名称和 DOS 时代的“旧”8.3 名称。

You can absolutely use the 8.3 name if you want, just don't show it to your users or they'll be (correctly) confused.如果您愿意,您绝对可以使用 8.3 名称,只是不要向您的用户显示它,否则他们会(正确地)混淆。 Or just use the proper, modern API to get the real name.或者只是使用适当的现代 API 来获取真实姓名。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM