简体   繁体   English

如何使用ReadFile修复乱码?

[英]How to fix garbled text with using ReadFile?

I have a Win32 application that I'm making. 我有一个正在制作的Win32应用程序。 Use "ReadFile" to retrieve a text file that is written in Unicode. 使用“ ReadFile”检索以Unicode编写的文本文件。 To be printed in the EditBox. 要在EditBox中打印。

const TCHAR FILE_DIRECTORY[] = TEXT("data/");
const TCHAR FILE_LIST[][MAX_LOADSTRING] = { 
    TEXT("fputs_fgets.h"), TEXT("fprintf_fscanf.h"), 
    TEXT("fprintfs_fscanfs.h"), TEXT("fread_fwrite.h"), TEXT("freads_fwrite.h") };
const int FILE_NAME_LENGTH = _tcslen(FILE_LIST[idx]);
const int FILE_DIRECTORY_LENGTH = _tcslen(FILE_DIRECTORY);

TCHAR* filePath = (TCHAR*)calloc(FILE_NAME_LENGTH + FILE_DIRECTORY_LENGTH + 1, sizeof(TCHAR));
_tcscpy_s(filePath, FILE_DIRECTORY_LENGTH + 1, FILE_DIRECTORY);
_tcscat_s(filePath, FILE_NAME_LENGTH + FILE_DIRECTORY_LENGTH + 1, FILE_LIST[idx]);

HANDLE file = CreateFile(filePath, GENERIC_READ, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
DWORD fileSize = GetFileSize(file, NULL);
DWORD dwRead;

if (editText != NULL)
    free(editText);
editText = (TCHAR*)calloc(1, fileSize + 1);
ReadFile(file, editText, fileSize, &dwRead, NULL);
CloseHandle(file);
free(filePath);

However, there are some strange characters on the back of the output. 但是,输出的后面有一些奇怪的字符。

        printf("y좌표(정수): %d\n", point.y);
    }

    fclose(file);
}ﴀ﷽ý

How can i fix it? 我该如何解决? Thank you. 谢谢。

Assuming your file is UTF-16 and you are compiling with _UNICODE defined (assumptions justified by the fact that the rest of your text is read correctly), in this line: 假设您的文件为UTF-16,并且您正在使用_UNICODE定义进行编译(假定正确读取其余文本这一事实是合理的),在此行中:

editText = (TCHAR*)calloc(1, fileSize + 1);

you should actually do fileSize + sizeof(TCHAR) if you want to exploit the zeroing that calloc does to get a NUL-terminated string. 如果要利用calloc的调零来获取NUL终止的字符串,则应该实际执行fileSize + sizeof(TCHAR) As it is now, you have a wide string whose last character has only the low byte to zero, so the rest of your code goes on reading garbage until it happens to find two solid bytes of zero (adequately aligned). 到现在为止,您有一个宽字符串,其最后一个字符的低位字节仅为零,因此其余代码继续读取垃圾,直到碰巧找到两个零的固定字节(充分对齐)为止。

Mind you, I'm extremely dubious about this code in general - if you use TCHAR it means you want to compile both in ANSI ( TCHAR == char ) and in Unicode ( TCHAR ==wchar_t ), having this change how you interpret the bytes of external files is a disputable idea. 请注意,我通常对此代码非常怀疑-如果您使用TCHAR则意味着您希望同时以ANSI( TCHAR == char )和Unicode( TCHAR ==wchar_t )进行编译,此更改将解释您的外部文件的字节数是一个有争议的想法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM