简体   繁体   中英

How to check if a .txt file is in ASCII or UTF-8 format in Windows environment?

I have converted a .txt file from ASCII to UTF-8 using UltraEdit. However, I am not sure how to verify if it is in UTF-8 format in Windows environment.

Thank you!

Open the file in Notepad. Click 'Save As...'. In the 'Encoding:' combo box you will see the current file format.

使用 Notepad++ 打开文件并检查“编码”菜单,您可以检查当前的编码和/或转换为一组可用的编码。

Text files in Windows don't have a format. There's an unofficial convention that if the file starts with the BOM codepoint in UTF-8 format that it's UTF-8, but that convention isn't universally supported. That would be the 3 byte sequence "\\xef\\xbf\\xbe" , ie ￾ in the Latin-1 character set.

在十六进制编辑器中打开它并确保前三个字节是UTF8 BOM ( EF BB BF )

If you use Windows 10 and has Windows Subsystem for Linux (WSL), it can be easily done by typing "file " from the shell.

For example:

$ file code.cpp

code.cpp: C source, UTF-8 Unicode (with BOM) text, with CRLF line terminators

I had a directory of files that I wanted to check. I created an Excel macro to determine ANSI vs. UTF-8. This worked for me.

        Sub GetTextFileEncoding()
        Dim sFile As String
        Dim sPath As String
        Dim sTextLine As String
        Dim iRow As Integer

        'Set Defaults and Initial Values
        iRow = 1
        sPath = "C:textfiles\"
        sFile = Dir(sPath & "*.txt")

        Do While Len(sFile) > 0
            'Get FileType
            'Debug.Print sFile & " - " & FileEncodeType(sPath & sFile)

            'Show on Excel Worksheet
            Cells(iRow, 1).Value = sFile
            Cells(iRow, 2).Value = FileEncodeType(sPath & sFile)

            'Get next file
            sFile = Dir

            'Increment Row
            iRow = iRow + 1
        Loop
    End Sub

    Function FileEncodeType(sFile As String) As String
        Dim bEF As Boolean
        Dim bBB As Boolean
        Dim bBF As Boolean

        bEF = False
        bBB = False
        bBF = False

        Open sFile For Input As #1
            If Not EOF(1) Then
                'Read first line
                Line Input #1, textline
                'Debug.Print textline
                For i = 1 To 3
                    'Debug.Print Asc(Mid(textline, i, 1)) & " - " & Mid(textline, i, 1)
                    Select Case i
                        Case 1
                            If Asc(Mid(textline, i, 1)) = 239 Then
                                bEF = True
                            End If
                        Case 2
                             If Asc(Mid(textline, i, 1)) = 187 Then
                                bBB = True
                            End If
                        Case 3
                             If Asc(Mid(textline, i, 1)) = 191 Then
                                bBF = True
                            End If
                        Case 4

                    End Select
                Next
            End If
        Close #1

        If bEF And bBB And bBF Then
            FileEncodeType = "UTF-8"
        Else
            FileEncodeType = "ANSI"
        End If
    End Function

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM