Hi Stackoverflow community!
I have .csv file with some values "{Null}" and "Null". I use a batch file (.cmd) with PowerShell function to replace that values with "". The issue is that the output file has a different encoding (utf-16le) than the input (UTF-8). Is there a way to keep the original encoding?
powershell -Command "(gc myfile.csv) -replace '{NULL}', '' | Out-File myfile_replaced.csv"
I tried to find a solution and understood, the Notepad by default has UTF-16le encoding. Theoretically, I could change the Encoding of the Notepad++, but this is not an option, as the code should be shared with others.
And this should be implemented in Batch, otherwise I could manually Search and Replace the values.
Out-File
supports using -Encoding
as a parameter. This is true for various other cmdlets that write files (eg Export-Csv
) as well.
As per documentation:
-Encoding
Specifies the encoding for the exported CSV file. The default value is UTF8NoBOM.
The acceptable values for this parameter are as follows:
- ASCII: Uses the encoding for the ASCII (7-bit) character set.
- BigEndianUnicode: Encodes in UTF-16 format using the big-endian byte order.
- OEM: Uses the default encoding for MS-DOS and console programs.
- Unicode: Encodes in UTF-16 format using the little-endian byte order.
- UTF7: Encodes in UTF-7 format.
- UTF8: Encodes in UTF-8 format.
- UTF8BOM: Encodes in UTF-8 format with Byte Order Mark (BOM)
- UTF8NoBOM: Encodes in UTF-8 format without Byte Order Mark (BOM)
- UTF32: Encodes in UTF-32 format.
Beginning with PowerShell 6.2, the Encoding parameter also allows numeric IDs of registered code pages (like -Encoding 1251) or string names of registered code pages (like -Encoding "windows-1251"). For more information, see the .NET documentation for Encoding.CodePage.
Unfortunately, out-file or ">" or ">>" defaults to "unicode" or utf16 encoding. You can even mix two encodings in the same file with ">>" or "out-file -append". You can use set-content instead or "out-file -encoding utf8". Actually set-content defaults to ansi encoding. But without special characters, it will be the same as utf8 (without the bom), or you can use a -encoding option with set-content as well. Notepad defaults to ansi, but can recognize utf8 or unicode even without bom's or encoding signatures.
powershell -Command "(gc myfile.csv) -replace '{NULL}', '' | set-content myfile_replaced.csv"
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.