在 Windows 中获取文件的编码

Question

这不是一个真正的编程问题，是否有命令行或 Windows 工具（Windows 7）来获取文本文件的当前编码？ 当然我可以编写一个小的 C# 应用程序，但我想知道是否已经内置了一些东西？

Answer 1

使用 Windows 附带的普通旧香草记事本打开您的文件。
单击“另存为... ”时，它会显示文件的编码。
它看起来像这样： 在此处输入图像描述

无论默认选择的编码是什么，这就是您当前的文件编码。
如果是UTF-8，你可以把它改成ANSI然后点击保存来改变编码（反之亦然）。

我知道有许多不同类型的编码，但是当我得知我们的导出文件是 UTF-8 并且它们需要 ANSI 时，这就是我所需要的。 这是一次导出，所以记事本适合我。

仅供参考：根据我的理解，我认为“ Unicode ”（如记事本中所列）是 UTF-16 的误称。
有关记事本“ Unicode ”选项的更多信息： Windows 7 - UTF-8 和 Unicdoe

Answer 2

如果你的 Windows 机器上有“git”或“Cygwin”，那么转到你的文件所在的文件夹并执行命令：

file *

这将为您提供该文件夹中所有文件的编码详细信息。

Answer 3

(Linux) 命令行工具“文件”可通过 GnuWin32 在 Windows 上使用：

http://gnuwin32.sourceforge.net/packages/file.htm

如果您安装了 git，它位于 C:\Program Files\git\usr\bin。

例子：

    C:\Users\SH\Downloads\SquareRoot>file *
    _UpgradeReport_Files;         directory
    Debug;                        directory
    duration.h;                   ASCII C++ program text, with CRLF line terminators
    ipch;                         directory
    main.cpp;                     ASCII C program text, with CRLF line terminators
    Precision.txt;                ASCII text, with CRLF line terminators
    Release;                      directory
    Speed.txt;                    ASCII text, with CRLF line terminators
    SquareRoot.sdf;               data
    SquareRoot.sln;               UTF-8 Unicode (with BOM) text, with CRLF line terminators
    SquareRoot.sln.docstates.suo; PCX ver. 2.5 image data
    SquareRoot.suo;               CDF V2 Document, corrupt: Cannot read summary info
    SquareRoot.vcproj;            XML  document text
    SquareRoot.vcxproj;           XML document text
    SquareRoot.vcxproj.filters;   XML document text
    SquareRoot.vcxproj.user;      XML document text
    squarerootmethods.h;          ASCII C program text, with CRLF line terminators
    UpgradeLog.XML;               XML  document text

    C:\Users\SH\Downloads\SquareRoot>file --mime-encoding *
    _UpgradeReport_Files;         binary
    Debug;                        binary
    duration.h;                   us-ascii
    ipch;                         binary
    main.cpp;                     us-ascii
    Precision.txt;                us-ascii
    Release;                      binary
    Speed.txt;                    us-ascii
    SquareRoot.sdf;               binary
    SquareRoot.sln;               utf-8
    SquareRoot.sln.docstates.suo; binary
    SquareRoot.suo;               CDF V2 Document, corrupt: Cannot read summary infobinary
    SquareRoot.vcproj;            us-ascii
    SquareRoot.vcxproj;           utf-8
    SquareRoot.vcxproj.filters;   utf-8
    SquareRoot.vcxproj.user;      utf-8
    squarerootmethods.h;          us-ascii
    UpgradeLog.XML;               us-ascii

Answer 4

我发现另一个有用的工具： https ://archive.codeplex.com/?p=encodingchecker EXE 可以在这里找到

Answer 5

安装 git（在 Windows 上你必须使用 git bash 控制台）。 类型：

file --mime-encoding *

对于当前目录中的所有文件，或

file --mime-encoding */*

对于所有子目录中的文件

Answer 6

这是我如何通过 BOM 检测文本编码的 Unicode 系列。 此方法的准确性较低，因为此方法仅适用于文本文件（特别是 Unicode 文件），并且在不存在UTF8时默认为ascii （与大多数文本编辑器一样，如果要匹配 HTTP/网络生态系统）。

2018 年更新：我不再推荐这种方法。 我建议使用来自 GIT 的 file.exe 或 @Sybren 推荐的 *nix 工具，我会在稍后的回答中展示如何通过 PowerShell 执行此操作。

# from https://gist.github.com/zommarin/1480974
function Get-FileEncoding($Path) {
    $bytes = [byte[]](Get-Content $Path -Encoding byte -ReadCount 4 -TotalCount 4)

    if(!$bytes) { return 'utf8' }

    switch -regex ('{0:x2}{1:x2}{2:x2}{3:x2}' -f $bytes[0],$bytes[1],$bytes[2],$bytes[3]) {
        '^efbbbf'   { return 'utf8' }
        '^2b2f76'   { return 'utf7' }
        '^fffe'     { return 'unicode' }
        '^feff'     { return 'bigendianunicode' }
        '^0000feff' { return 'utf32' }
        default     { return 'ascii' }
    }
}

dir ~\Documents\WindowsPowershell -File | 
    select Name,@{Name='Encoding';Expression={Get-FileEncoding $_.FullName}} | 
    ft -AutoSize

建议：如果dir 、 ls或Get-ChildItem仅检查已知的文本文件，并且当您仅从已知的工具列表中查找“错误编码”时，这会相当有效。 （即 SQL Management Studio 默认为 UTF16，这破坏了 Windows 的 GIT auto-cr-lf，这是多年来的默认设置。）

Answer 7

一个简单的解决方案可能是在 Firefox 中打开文件。

将文件拖放到 firefox 中
按Ctrl+I打开页面信息

并且文本编码将出现在“页面信息”窗口中。

注意：如果文件不是txt格式，只需重命名为txt再试。

PS 有关详细信息，请参阅本文。

Answer 8

我写了 #4 答案（在撰写本文时）。 但是最近我在所有计算机上都安装了 git，所以现在我使用@Sybren 的解决方案。 这是一个新的答案，它使该解决方案可以从 powershell 方便地使用（没有将所有 git/usr/bin 放在 PATH 中，这对我来说太混乱了）。

将此添加到您的profile.ps1 ：

$global:gitbin = 'C:\Program Files\Git\usr\bin'
Set-Alias file.exe $gitbin\file.exe

并像这样使用： file.exe --mime-encoding * 。 您必须在命令中包含 .exe才能使 PS 别名起作用。

但是，如果您不自定义 PowerShell profile.ps1，我建议您从我的开始： https ://gist.github.com/yzorg/8215221/8e38fd722a3dfc526bbe4668d1f3b08eb7c08be0 并将其保存到~\Documents\WindowsPowerShell 。 在没有git的电脑上使用是安全的，但是找不到git的时候会写警告。

命令中的.exe也是我从 powershell 使用C:\WINDOWS\system32\where.exe的方式； 以及许多其他被 powershell“默认隐藏”的 OS CLI 命令，*耸肩*。

Answer 9

您可以通过在文件位置打开 git bash 然后运行命令file -i file_name来简单地检查

例子

user filesData
$ file -i data.csv
data.csv: text/csv; charset=utf-8

Answer 10

此处的一些 C 代码用于可靠的 ascii、bom 和 utf8 检测： https ://unicodebook.readthedocs.io/guess_encoding.html

只有 ASCII、UTF-8 和使用 BOM 的编码（带 BOM 的 UTF-7、带 BOM 的 UTF-8、UTF-16 和 UTF-32）具有可靠的算法来获取文档的编码。 对于所有其他编码，您必须相信基于统计数据的启发式方法。

编辑：

C# 答案的 powershell 版本来自： Effective way to find any file's Encoding 。 仅适用于签名 (boms)。

# get-encoding.ps1
param([Parameter(ValueFromPipeline=$True)] $filename)    
begin {
  # set .net current directoy                                                                                                   
  [Environment]::CurrentDirectory = (pwd).path
}
process {
  $reader = [System.IO.StreamReader]::new($filename, 
    [System.Text.Encoding]::default,$true)
  $peek = $reader.Peek()
  $encoding = $reader.currentencoding
  $reader.close()
  [pscustomobject]@{Name=split-path $filename -leaf
                BodyName=$encoding.BodyName
                EncodingName=$encoding.EncodingName}
}


.\get-encoding chinese8.txt

Name         BodyName EncodingName
----         -------- ------------
chinese8.txt utf-8    Unicode (UTF-8)


get-childitem -file | .\get-encoding

Answer 11

您可以使用一个名为Encoding Recognizer的免费实用程序（需要java）。 您可以在http://mindprod.com/products2.html#ENCODINGRECOGNISER上找到它

Answer 12

寻找 Node.js/npm 解决方案？ 尝试编码检查器：

npm install -g encoding-checker

用法

Usage: encoding-checker [-p pattern] [-i encoding] [-v]
 
Options:
  --help                 Show help                                     [boolean]
  --version              Show version number                           [boolean]
  --pattern, -p, -d                                               [default: "*"]
  --ignore-encoding, -i                                            [default: ""]
  --verbose, -v                                                 [default: false]

例子

获取当前目录下所有文件的编码：

encoding-checker

返回当前目录下所有md文件的编码：

encoding-checker -p "*.md"

获取当前目录及其子文件夹中所有文件的编码（大文件夹需要相当长的时间；看似没有响应）：

encoding-checker -p "**"

有关更多示例，请参阅npm 文档或官方存储库。

Answer 13

与上面列出的记事本解决方案类似，您也可以在 Visual Studio 中打开该文件（如果您正在使用它）。 在 Visual Studio 中，您可以选择“文件 > 高级保存选项...”

“编码：”组合框会具体告诉您文件当前使用的是哪种编码。 它列出的文本编码比记事本多得多，因此在处理来自世界各地的各种文件和其他任何文件时非常有用。

就像记事本一样，您也可以从那里的选项列表中更改编码，然后在点击“确定”后保存文件。 您还可以通过“另存为”对话框中的“使用编码保存...”选项选择所需的编码（通过单击“保存”按钮旁边的箭头）。

Answer 14

我发现这样做的唯一方法是 VIM 或 Notepad++。

Answer 15

编码检查器

文件编码检查器是一种 GUI 工具，可让您验证一个或多个文件的文本编码。 该工具可以显示所有选定文件的编码，或仅显示不具有您指定编码的文件。

文件编码检查器需要 .NET 4 或更高版本才能运行。

在 Windows 中获取文件的编码

问题描述

14 个解决方案

解决方案1
306 已采纳 2012-11-20 00:27:03

解决方案2
107 2017-04-19 07:37:36

解决方案3
77 2016-01-13 11:58:49

解决方案4
26 2013-01-09 08:51:26

解决方案5
25 2019-11-15 14:57:45

解决方案6
20 2015-01-22 00:02:08

解决方案7
12 2019-08-08 17:37:28

解决方案8
9 2017-10-18 17:36:44

解决方案9
4 2022-02-23 14:04:52

解决方案10
3 2018-11-08 17:43:02

解决方案11
3 2011-05-06 20:52:35

解决方案12
3 2021-01-27 21:22:39

用法

例子

解决方案13
2 2016-10-11 18:57:00

解决方案14
2 2017-09-14 15:49:44

解决方案15
2 2020-07-08 16:29:05

在 Windows 中获取文件的编码

问题描述

14 个解决方案

解决方案1 306 已采纳 2012-11-20 00:27:03

解决方案2 107 2017-04-19 07:37:36

解决方案3 77 2016-01-13 11:58:49

解决方案4 26 2013-01-09 08:51:26

解决方案5 25 2019-11-15 14:57:45

解决方案6 20 2015-01-22 00:02:08

解决方案7 12 2019-08-08 17:37:28

解决方案8 9 2017-10-18 17:36:44

解决方案9 4 2022-02-23 14:04:52

解决方案10 3 2018-11-08 17:43:02

解决方案11 3 2011-05-06 20:52:35

解决方案12 3 2021-01-27 21:22:39

用法

例子

解决方案13 2 2016-10-11 18:57:00

解决方案14 2 2017-09-14 15:49:44

解决方案15 2 2020-07-08 16:29:05

解决方案1
306 已采纳 2012-11-20 00:27:03

解决方案2
107 2017-04-19 07:37:36

解决方案3
77 2016-01-13 11:58:49

解决方案4
26 2013-01-09 08:51:26

解决方案5
25 2019-11-15 14:57:45

解决方案6
20 2015-01-22 00:02:08

解决方案7
12 2019-08-08 17:37:28

解决方案8
9 2017-10-18 17:36:44

解决方案9
4 2022-02-23 14:04:52

解决方案10
3 2018-11-08 17:43:02

解决方案11
3 2011-05-06 20:52:35

解决方案12
3 2021-01-27 21:22:39

解决方案13
2 2016-10-11 18:57:00

解决方案14
2 2017-09-14 15:49:44

解决方案15
2 2020-07-08 16:29:05