简体   繁体   中英

How to keep UTF-8 in batch for csv file?

Hi Stackoverflow community!

I have .csv file with some values "{Null}" and "Null". I use a batch file (.cmd) with PowerShell function to replace that values with "". The issue is that the output file has a different encoding (utf-16le) than the input (UTF-8). Is there a way to keep the original encoding?

powershell -Command "(gc myfile.csv) -replace '{NULL}', '' | Out-File myfile_replaced.csv"

I tried to find a solution and understood, the Notepad by default has UTF-16le encoding. Theoretically, I could change the Encoding of the Notepad++, but this is not an option, as the code should be shared with others.

And this should be implemented in Batch, otherwise I could manually Search and Replace the values.

Out-File supports using -Encoding as a parameter. This is true for various other cmdlets that write files (eg Export-Csv ) as well.

As per documentation:

-Encoding

Specifies the encoding for the exported CSV file. The default value is UTF8NoBOM.

The acceptable values for this parameter are as follows:

  • ASCII: Uses the encoding for the ASCII (7-bit) character set.
  • BigEndianUnicode: Encodes in UTF-16 format using the big-endian byte order.
  • OEM: Uses the default encoding for MS-DOS and console programs.
  • Unicode: Encodes in UTF-16 format using the little-endian byte order.
  • UTF7: Encodes in UTF-7 format.
  • UTF8: Encodes in UTF-8 format.
  • UTF8BOM: Encodes in UTF-8 format with Byte Order Mark (BOM)
  • UTF8NoBOM: Encodes in UTF-8 format without Byte Order Mark (BOM)
  • UTF32: Encodes in UTF-32 format.

Beginning with PowerShell 6.2, the Encoding parameter also allows numeric IDs of registered code pages (like -Encoding 1251) or string names of registered code pages (like -Encoding "windows-1251"). For more information, see the .NET documentation for Encoding.CodePage.

Unfortunately, out-file or ">" or ">>" defaults to "unicode" or utf16 encoding. You can even mix two encodings in the same file with ">>" or "out-file -append". You can use set-content instead or "out-file -encoding utf8". Actually set-content defaults to ansi encoding. But without special characters, it will be the same as utf8 (without the bom), or you can use a -encoding option with set-content as well. Notepad defaults to ansi, but can recognize utf8 or unicode even without bom's or encoding signatures.

powershell -Command "(gc myfile.csv) -replace '{NULL}', '' | set-content myfile_replaced.csv"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM