简体   繁体   English

在go中读取US-ASCII文件

[英]Read US-ASCII File in go

currently I try to read a us-ascii file into golang, but everytime I do so, every special sign, like Ä Ö Ü ß gets replaced with a ? 目前,我尝试将us-ascii文件读取到golang中,但每次这样做时,每个特殊符号(例如ÄÖÜß)都会被替换为? or on my database with the special sign ?. 或在我的数据库中带有特殊符号?。

Is there anything I could do to prevent it? 有什么我可以防止的吗?

Here is how I read my file: 这是我读取文件的方式:

file, err := os.Open(path)
if err != nil {
    return err
}
var lines []string
r := bufio.NewReader(file)
for {
    line, err := r.ReadBytes('\n')
    if err != nil {
        break
    }
    lines = append(lines, string(line))
}
fmt.Println(strings.Join(lines, ""))
index.Content = strings.Join(lines, "")

Since the letters Ä Ö Ü ß doesn't exist in US-ASCII, I would make an educated guess that you are actually dealing with the Latin-1 (ISO-8859-1) encoding. 由于字母ÄÖÜß在US-ASCII中不存在,因此我有根据地猜测,您实际上正在处理Latin-1(ISO-8859-1)编码。

Converting from Latin-1 can be done like this: 从Latin-1转换可以这样完成:

runes := make([]rune, len(line))
for i, b := range line {
    runes[i] = rune(b)
}
lines = append(lines, string(runes))

Edit: 编辑:

The example is not optimized, but it shows how a Latin-1 byte can be stored in a rune as the values of Latin-1 corresponds directly to the Unicode code point. 该示例未进行优化,但显示了如何将Latin-1字节存储在rune因为Latin-1的值直接对应于Unicode代码点。 The actual encoding into UTF-8 is then done when converting []rune to string . 当将[]rune转换为string时,便完成了到UTF-8的实际编码。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM