简体   繁体   中英

Is there a way to check encoding of files through Install script or batch?

Is there a way through install script/Windows batch/PowerShell that I will be able to check if a file is UTF-8 before passing it for conversion?

As a background, I am currently working on a legacy (Japanese) Windows software developed with Visual Studio 2005 (Upgraded to Visual Studio 2017) in C++.

I am dealing with a requirement to make GUI be able display and input Chinese characters. Thus the decision to use UNICODE for the project/solution encoding.

Since the project was originally using Multibyte, to be backwards compatible with UNICODE I have decided to encode configuration files (ini, dat, save files) in UTF-8 as these files are also referenced by a web application.

The main bits of the software are now done and working, and I am left with one last problem - rolling out a version up installer.

In this installer (using Install script), I am required to update save files (previously encoded in SHIFT-JIS as these save files contains Japanese text) to UTF-8.

I have already created a batch file in the following lines which converts SHIFT-JIS to UTF-8, which is called at the last part of the installer and is deleted after conversion.

@echo off
:: Shift_JIS -> UTF-8
setlocal enabledelayedexpansion
for %%f in ("%~dp0\savedfiles\*.sav") do (
    echo %%~ff| findstr /l /e /i ".sav"
      if !ERRORLEVEL! equ 0 (
        powershell -nop -c "&{[IO.File]::WriteAllText($args[1], [IO.File]::ReadAllText($args[0], [Text.Encoding]::GetEncoding(932)))}" \"%%~ff"  \"%%~ff" 
      )
)

However, the problem with this is that when the user (1) upgrades, (2) uninstalls (.sav files are left behind on purpose) and (3) re-installs the software the save files are doubly re-encoded and results in the software crashing. (UTF-8 Japanese characters updated during (1) upgrade, become garbage characters after (3) re-installation.)

If you're upgrading then all the current files should be in Shift-JIS. Even if you have some situations that leave both Shift-JIS and UTF-8 files at the same time then there are only 2 types of encodings that you need to handle . Therefore you can work around this by checking if the file is not valid UTF-8 then it's Shift-JIS. Of course this will still subject to incorrect detection in some rare cases but otherwise it might be good for your use case

By default when reading text files a best-fit fallback or replacement fallback handler is used. We can change to an exception fallback so it'll throw an exception if a Shift-JIS file is opened as UTF-8

try {
    $t = [IO.File]::ReadAllText($f, [Text.Encoding]::GetEncoding(65001, `
         (New-Object Text.EncoderExceptionFallback), `
         (New-Object Text.DecoderExceptionFallback)))
} catch {
    # File is not UTF-8, reopen as Shift-JIS
    $t = [IO.File]::ReadAllText($f, [Text.Encoding]::GetEncoding(932))
}

# Write the file as UTF-8
[IO.File]::WriteAllText($f, $t)

It's better to loop through the files and convert in PowerShell. If you really need to use a batch file then wrap everything in a *.ps1 file and call it from batch

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM