简体   繁体   中英

Unexpected characters in file when using System.IO.FileStream.WriteByte() from within Powershell

Please consider the following Powershell script:

Out-File -FilePath "c:\batch\vss\test.txt" -NoNewline -InputObject "0"

$Stream = [System.IO.File]::Open("c:\batch\vss\test.txt",
                                 [System.IO.FileMode]::Open,
                                 [System.IO.FileAccess]::Write,
                                 [System.IO.FileShare]::ReadWrite)
$Stream.WriteByte(49)

$Stream.Dispose()

After having executed the script, the file c:\\batch\\vss\\test.txt contains unexpected characters. It contains the following 4 bytes (in HEX):

31 fe 30 00

From these, only the first one is expected ( 0x31 = 49 ). The others are unexpected. [The last sentence is wrong - see section UPDATE / SOLUTION]

I have verified that this file, after having executed the first line of the script, contains exactly one byte ( 0x30 , which is the ASCII code of '0' ). So it definitely is the .WriteByte() which adds the additional characters. [This statement is wrong - see section UPDATE / SOLUTION]

The weird thing is that .WriteByte() partly works as expected: Obviously, it overwrites the first byte in the file, which is 0x30 after the first line of the script has run, by 0x31 .

But why does it add the other three bytes, and how do I prevent this? Am I using the .NET library in a wrong way from within Powershell (Powershell newbie here ...)? [This question is invalid since .WriteByte() does not add the other three bytes - see section UPDATE / SOLUTION]

UPDATE / SOLUTION

Mathias R. Jessen's answer is completely correct. However, I'd like to shortly explain why I didn't see this myself (although knowing about the byte order mark before):

I have used Notepad++ in combination with the Hex Editor Plugin to investigate what happened. Obviously, this combo has problems with updating the HEX view of an opened file when the file is altered, which is why it has put me on the wrong track. This was now the second time that this problem bite me, so I'll definitely use other HEX editors in the future.

After the answer had been written, I re-investigated, this time using HxD, and immediately saw what was going on.

To make a long story short: This actually was not a problem with System.IO.FileStream.WriteByte() , but a problem with the tools used for investigation.

The solution is simple: If I want one byte in the file, I can use Out-File with another encoding, or I can use .WriteByte() to create the file in the first place.

In Windows PowerShell, Out-File defaults to little-endian UTF-16 encoding (colloquially known as Unicode encoding in Windows).

When you execute Out-File with the value "0" , it writes the following byte sequence to the on disk file:

ff fe 48 00
\___/ \___/
  |     | 
  |    UTF-16LE encoded "0"
UTF-16LE byte order mark

When you call [File]::Open() with [FileMode]::Open , it returns a FileStream object pointing to the lowest offset in the file, so WriteByte(49) ends up overwriting the first part of the byte-order mark:

49 fe 48 00
 |
Overwritten

If you always want to overwrite any data in an existing file, use [FileMode]::Truncate :

$Stream = [System.IO.File]::Open("c:\batch\vss\test.txt",
                                 [System.IO.FileMode]::Open,
                                 [System.IO.FileAccess]::Truncate,
                                 [System.IO.FileShare]::ReadWrite)

If you want to truncate the file to get rid of the excess data manually, use SetLength :

$Stream.SetLength(1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM