简体   繁体   English

如何使用go在Windows控制台中正确输出字符串?

[英]How to properly output a string in a Windows console with go?

I have a exe in go which prints utf-8 encoded strings, with special characters in it. 我有一个exe in go打印utf-8编码的字符串,其中包含特殊字符。
Since that exe is made to be used from a console window, its output is mangled because Windows uses ibm850 encoding (aka code page 850 ). 由于该exe用于从控制台窗口使用,因此其输出被破坏,因为Windows使用ibm850编码(也就是code page 850 )。

How would you make sure the go exe print correctly encoded strings for a console windows, ie print for instance: 你如何确保go exe为控制台窗口打印正确编码的字符串,例如打印:

éèïöîôùòèìë

instead of (without any translation to the right charset ) 而不是(没有任何翻译到正确的字符集

├®├¿├»├Â├«├┤├╣├▓├¿├¼├½
// Alert: This is Windows-specific, uses undocumented methods, does not
// handle stdout redirection, does not check for errors, etc.
// Use at your own risk.
// Tested with Go 1.0.2-windows-amd64.

package main

import "unicode/utf16"
import "syscall"
import "unsafe"

var modkernel32 = syscall.NewLazyDLL("kernel32.dll")
var procWriteConsoleW = modkernel32.NewProc("WriteConsoleW")

func consolePrintString(strUtf8 string) {
    var strUtf16 []uint16
    var charsWritten *uint32

    strUtf16 = utf16.Encode([]rune(strUtf8))
    if len(strUtf16) < 1 {
        return
    }

    syscall.Syscall6(procWriteConsoleW.Addr(), 5,
        uintptr(syscall.Stdout),
        uintptr(unsafe.Pointer(&strUtf16[0])),
        uintptr(len(strUtf16)),
        uintptr(unsafe.Pointer(charsWritten)),
        uintptr(0),
        0)
}

func main() {
    consolePrintString("Hello ☺\n")
    consolePrintString("éèïöîôùòèìë\n")
}

The online book " Network programming with Go " ( CC BY-NC-SA 3.0 ) has a chapter on Charsets ( Managing character sets and encodings ), in which Jan Newmarch details the conversion of one charset to another . 在线书籍“ 网络编程与Go ”( CC BY-NC-SA 3.0 )有一章关于Charsets( 管理字符集和编码 ),其中Jan Newmarch详述了一个字符集到另一个字符集转换 But it seems cumbersome. 但这似乎很麻烦。

Here is a solution (I might have missed a much simpler one), using the library go-charset (from Roger Peppe ). 这是一个解决方案(我可能错过了一个更简单的解决方案),使用go-charset (来自Roger Peppe )。
I translate an utf-8 string to an ibm850 encoded one, allowing me to print in a DOS windows: 我将utf-8字符串翻译成ibm850编码的字符串,允许我在DOS窗口中打印:

éèïöîôùòèìë

The translation function is detailed below: 翻译功能详述如下:

package main

import (
    "bytes"
    "code.google.com/p/go-charset/charset"
    _ "code.google.com/p/go-charset/data"
    "fmt"
    "io"
    "log"
    "strings"
)

func translate(tr charset.Translator, in string) (string, error) {
    var buf bytes.Buffer
    r := charset.NewTranslatingReader(strings.NewReader(in), tr)
    _, err := io.Copy(&buf, r)
    if err != nil {
        return "", err
    }
    return string(buf.Bytes()), nil
}

func Utf2dos(in string) string {
    dosCharset := "ibm850"
    cs := charset.Info(dosCharset)
    if cs == nil {
        log.Fatal("no info found for %q", dosCharset)
    }
    fromtr, err := charset.TranslatorTo(dosCharset)
    if err != nil {
        log.Fatal("error making translator from %q: %v", dosCharset, err)
    }
    out, err := translate(fromtr, in)
    if err != nil {
        log.Fatal("error translating from %q: %v", dosCharset, err)
    }
    return out
}

func main() {
    test := "éèïöîôùòèìë"
    fmt.Println("utf-8:\n", test)
    fmt.Println("ibm850:\n", Utf2dos(test))
}

Since 2016, You can now (2017) consider the golang.org/x/text , which comes with a encoding charmap including the ISO-8859 family as well as the Windows 1252 character set. 自2016年起,您现在可以(2017)考虑golang.org/x/text ,它带有编码charmap,包括ISO-8859系列以及Windows 1252字符集。

See " Go Quickly - Converting Character Encodings In Golang " 请参阅“ 快速转到 - 在Golang中转换字符编码

r := charmap.ISO8859_1.NewDecoder().Reader(f)
io.Copy(out, r)

That is an extract of an example opening a ISO-8859-1 source text ( my_isotext.txt ), creating a destination file ( my_utf.txt ), and copying the first to the second. 这是打开ISO-8859-1源文本( my_isotext.txt )的示例的摘录,创建目标文件( my_utf.txt ),并将第一个复制到第二个。
But to decode from ISO-8859-1 to UTF-8, we wrap the original file reader ( f ) with a decoder. 但是要从ISO-8859-1解码为UTF-8,我们用解码器包装原始文件阅读器( f )。

I just tested (pseudo-code for illustration): 我刚刚测试过(伪代码用于说明):

package main

import (
    "fmt"

    "golang.org/x/text/encoding"
    "golang.org/x/text/encoding/charmap"
)

func main() {
    t := "string composed of character in cp 850"
    d := charmap.CodePage850.NewDecoder()
    st, err := d.String(t)
    if err != nil {
        panic(err)
    }
    fmt.Println(st)
}

The result is a string readable in a Windows CMD. 结果是Windows CMD中可读取的字符串。
See more in this Nov. 2018 reddit thread . 2018年11月的reddit主题中查看更多内容。

It is something that Go still can't do out of the box - see http://code.google.com/p/go/issues/detail?id=3376#c6 . Go仍然无法开箱即用 - 请参阅http://code.google.com/p/go/issues/detail?id=3376#c6

Alex 亚历克斯

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM