[英]How to read a file name containing 'œ' as character in C/C++ on windows
This post is not a duplicate of this one: dirent not working with unicode这篇文章不是这个帖子的重复: dirent not working with unicode
Because here I'm using it on a different OS and I also don't want to do the same thing.因为在这里我在不同的操作系统上使用它,而且我也不想做同样的事情。 The other thread is trying to simply count the files, and I want to access the file name which is more complex.
另一个线程试图简单地计算文件,我想访问更复杂的文件名。
I'm trying to retrieve data information through files names on a windows 10 OS.我正在尝试通过 Windows 10 操作系统上的文件名检索数据信息。
For this purpose I use dirent.h
(external c library, but still very usefull also in c++).为此,我使用
dirent.h
(外部 c 库,但在 c++ 中仍然非常有用)。
DIR* directory = opendir(path);
struct dirent* direntStruct;
if (directory != NULL)
{
while (direntStruct = readdir(directory))
{
cout << direntStruct->d_name << endl;
}
}
This code is able to retrieve all files names located in a specific folder (one by one).此代码能够检索位于特定文件夹中的所有文件名(一个一个)。 And it works pretty well!
而且效果很好!
But when it encounter a file containing the character 'œ' then things are going crazy:但是当它遇到一个包含字符 'œ' 的文件时,事情就会变得疯狂:
Example:例子:
grosse blessure au cœur.txt
is read in my program as:在我的程序中读取为:
GUODU0~6.TXT
I'm not able to find the original data in the string name because as you can see my string variable has nothing to do with the current file name!我无法在字符串名称中找到原始数据,因为您可以看到我的字符串变量与当前文件名无关!
I can rename the file and it works, but I don't want to do this, I just need to read the data from that file name and it seems impossible.我可以重命名文件并且它可以工作,但我不想这样做,我只需要从该文件名中读取数据,这似乎是不可能的。 How can I do this?
我怎样才能做到这一点?
在 Windows 上,您可以使用FindFirstFile()
或FindFirstFileEx()
后跟FindNextFile()
来读取返回文件名中带有 Unicode 的目录的内容。
Short File Name短文件名
The name you receive is the 8.3 short file name NTFS generates for non-ascii file names, so they can be accessed by programs that don't support unicode.您收到的名称是 NTFS 为非 ascii 文件名生成的8.3 短文件名,因此不支持 unicode 的程序可以访问它们。
clinging to dirent
抱住
dirent
If dirent doesn't support UTF-16, your best bet may be to change your library.如果 dirent 不支持 UTF-16,最好的办法可能是更改您的库。
However, depending on the implementation of the library you may have luck with:但是,根据库的实现,您可能会很幸运:
adding / changing the manifest of your application to support UTF-8 in char
-based Windows API's.添加/更改应用程序的清单以支持基于
char
的 Windows API 中的 UTF-8。 This requires a very recent version of Windows 10.这需要最新版本的 Windows 10。
see MSDN: Use the UTF-8 code page under Windows - Apps - UWP - Design and UI - Usability - Globalization and localization .请参阅 MSDN: 使用Windows - 应用程序 - UWP - 设计和 UI - 可用性 - 全球化和本地化下的 UTF-8 代码页。
setting the C++ Runtime's code page to UTF-8 using setlocale
使用
setlocale
将 C++ 运行时的代码页设置为 UTF-8
I do not recommend this, and I don't know if this will work.我不推荐这个,我不知道这是否有效。
life is change生活就是改变
Use std::filesystem
to enumerate directory content.使用
std::filesystem
枚举目录内容。 A simple example can be found here (see the "Update 2017"). 可以在此处找到一个简单示例(请参阅“2017 年更新”)。
Windows only仅限 Windows
You can use FindFirstFileW
and FindNextFileW
as platform API's that support UTF16 strings.您可以使用
FindFirstFileW
和FindNextFileW
作为支持 UTF16 字符串的平台 API。 However, with std::filesystem there's little reason to do so (at least for your use case).但是,使用 std::filesystem 几乎没有理由这样做(至少对于您的用例而言)。
If you're in C, use the OS functions directly, specifically FindFirstFileW
and FindNextFileW
.如果您使用 C,请直接使用操作系统函数,特别是
FindFirstFileW
和FindNextFileW
。 Note the W
at the end, you want to use the wide versions of these functions to get back the full non-ASCII name.请注意末尾的
W
,您希望使用这些函数的宽版本来获取完整的非 ASCII 名称。
In C++ you have more options, specifically with Boost.在 C++ 中,您有更多选择,尤其是 Boost。 You have classes like
recursive_directory_iterator
which allow cross-platform file searching, and they provide UTF-8/UTF-16 file names.你有像
recursive_directory_iterator
这样的类允许跨平台文件搜索,它们提供 UTF-8/UTF-16 文件名。
Edit: Just to be absolutely clear, the file name you get back from your original code is correct.编辑:为了绝对清楚,您从原始代码中获得的文件名是正确的。 Due to backwards compatibility in Windows filesystems (FAT32 and NTFS), every file has two names: the "full", Unicode aware name, and the "old" 8.3 name from DOS days.
由于 Windows 文件系统(FAT32 和 NTFS)的向后兼容性,每个文件都有两个名称:“完整”、Unicode 识别名称和 DOS 时代的“旧”8.3 名称。
You can absolutely use the 8.3 name if you want, just don't show it to your users or they'll be (correctly) confused.如果您愿意,您绝对可以使用 8.3 名称,只是不要向您的用户显示它,否则他们会(正确地)混淆。 Or just use the proper, modern API to get the real name.
或者只是使用适当的现代 API 来获取真实姓名。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.