简体   繁体   中英

Reading a text file as bytes (byte by byte) using delphi 2010

I would like to read a UTF-8 text file byte by byte and get the ascii value representation of each byte in the file. Can this be done? If so, what is the best method?

My goal is to then replace 2 byte combinations that i find with one byte (these are set conditions that I have prepared)

for example, If I find a 197 followed by a 158 (decimal representations), i will replace it with a single byte 17

I don't want to use the standard delphi IO operations

AssignFile
ReSet
ReWrite(OutFile);
ReadLn
WriteLn
CloseFile

Is there a better method? Can this be done using TStream (Reader & Writer)?

Here is an example test I am using. I know there is a character (350) (two bytes) starting in column 84. When viewed in a hex editor, the character consists of 197 + 158 - so i am trying to find the 198 using my delphi code and can't seem to find it

FS1:= TFileStream.Create(ParamStr1, fmOpenRead);
try
 FS1.Seek(0, soBeginning);
 FS1.Position:= FS1.Position + 84;
 FS1.Read(B, SizeOf(B));
 if ord(B) = 197 then showMessage('True') else ShowMessage('False');
finally
 FS1.Free;
end;

You can use TFileStream to read all data from file to, for isntance, array of bytes, and later check for utf8 sequence. Also please note that utf8 sequence can contain more than 2 bytes.

And, in Delphi there is a function Utf8ToUnicode, which will convert utf8 data to usable unicode string.

My understanding is that you want to convert a text file from UTF-8 to ASCII. That's quite simple:

StringList.LoadFromFile(UTF8FileName, TEncoding.UTF8);
StringList.SaveToFile(ASCIIFileName, TEncoding.ASCII);

The runtime library comes with all sorts of functionality to convert between different text encodings. Surely you don't want to attempt to replicate this functionality yourself?

I trust you realise that this conversion is liable to lose data. Characters with ordinal greater than 127 cannot be represented in ASCII. In fact every code point that requires more than 1 octet in UTF-8 cannot be represented in ASCII.

You asked the same question 5 hours later in another topic, the answer od which better addresses your specific question:

Replacing a unicode character in UTF-8 file using delphi 2010

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM