简体   繁体   English

swift euc-kr韩文编码不起作用。 但是可以在python中工作

[英]swift euc-kr korean encoding not working. But works in python

I am writing some code to parse korean text from server encoded with euc-kr korean encoder. 我正在编写一些代码来解析使用euc-kr韩文编码器编码的服务器中的euc-kr

When I just do the same encoding in Python, it works as expected. 当我只是在Python中进行相同的编码时,它会按预期工作。 But when I do it as following, encoding doesn't work. 但是当我按照以下方式进行操作时,编码不起作用。 The result is unreadable. 结果不可读。

In Python : 在Python中:

string = u'안녕하세요.'.encode('eucKR') 

In Swift : 在Swift中:

let encoding:UInt =  CFStringConvertEncodingToNSStringEncoding(CFStringEncoding(
        CFStringEncodings.EUC_KR.rawValue))

let encodedData = "안녕하세요.".data(using: String.Encoding(rawValue: encoding))!

What the difference between those 2 encodings ? 这两种编码之间有什么区别?

Following are full source codes for both python and swift. 以下是python和swift的完整源代码。 I still stuck on the encoding part. 我仍然停留在编码部分。 Is the problem related to alamofire post request? 问题与alamofire发布请求有关吗?

Python: 蟒蛇:

import requests
from pattern import web

string = u'저는 내일 바빠서 학교에 못갑니다.'.encode('eucKR')
r = requests.post("http://nlp.korea.ac.kr/~demo/dglee/komatag.php", data={'formradio1': '', 'formradio2': 'ems', 'textarea': string})
dom = web.Element(r.text)
main = dom('tr')
for item in main:
    result = web.plaintext(item.source)
    a = result.encode('ISO-8859-1')
    t=a.decode('eucKR')
    print(t)

Swift: 迅速:

    override func viewDidLoad() {

        let string: NSString = NSString(string: "안녕하세요")
        let encodedEucKr = stringToEuckrString(stringValue: string as String)
        print(encodedEucKr)

        Alamofire.request("http://nlp.korea.ac.kr/~demo/dglee/komatag.php", method: .post, parameters: ["formradio1":"", "formradio2":"ems", "textarea": encodedEucKr], headers: nil).responseString { response in

            switch(response.result) {
            case .success(_):
                if let data = response.result.value{
                    print(response.result.value)
                }
                break

            case .failure(_):
                print(response.result.error)
                break

            }
        }

    }


func stringToEuckrString(stringValue: String) -> String {

    let encoding:UInt = CFStringConvertEncodingToNSStringEncoding(CFStringEncoding(
        CFStringEncodings.EUC_KR.rawValue))

    let encodedData = stringValue.data(using: String.Encoding(rawValue: encoding))!

    let attributedString = try? NSAttributedString(data: encodedData, options:[:],        documentAttributes: nil)

    if let _ = attributedString {
        return attributedString!.string
    } else {
        return ""
    }
}

It was not easy for two reasons... 这并不容易,原因有二...

  1. Sending form data in EUC-KR is not considered to be standard-compliant in modern web technologies and standards. 在现代Web技术和标准中,不认为使用EUC-KR发送表单数据符合标准。

  2. The response sent from your server is sort of broken, in that Swift cannot decode the result as a valid EUC-KR text. 您的服务器发送的响应有点中断,因为Swift无法将结果解码为有效的EUC-KR文本。

    (This seems to be a bug of your server side code.) (这似乎是服务器端代码的错误。)

Anyway, when you need to send a web form based request to your server in EUC-KR: 无论如何,当您需要将基于Web表单的请求发送到使用EUC-KR的服务器时:

  • Create a EUC-KR byte sequence from the original 从原始文件创建EUC-KR字节序列
  • Percent-escape it. 百分比转义。 You may need to do it by yourself 您可能需要自己做
  • Put entire request in an HTTP request body 将整个请求放入HTTP请求正文中
  • Add proper MIME type header 添加适当的MIME类型头

Some details depend on the server. 一些细节取决于服务器。 I have never used Alamofire, so I do not know if Alamofire supports such things. 我从未使用过Alamofire,所以我不知道Alamofire是否支持这些东西。

Here I show you an example using a normal URLSession : 在这里,我向您展示使用普通URLSession的示例:

override func viewDidLoad() {
    super.viewDidLoad()
    // Do any additional setup after loading the view, typically from a nib.
    sendRequest(string: "안녕하세요")
}

func sendRequest(string: String) {
    let rawEncoding = CFStringConvertEncodingToNSStringEncoding(CFStringEncoding(CFStringEncodings.EUC_KR.rawValue))
    let encoding = String.Encoding(rawValue: rawEncoding)

    let url = URL(string: "http://nlp.korea.ac.kr/~demo/dglee/komatag.php")!
    var request = URLRequest(url: url)
    request.httpMethod = "POST"
    //Create an EUC-KR byte sequece
    let eucKRStringData = string.data(using: encoding) ?? Data()
    //Percent-escape, you need to do it by yourself
    //(Though, most servers accept non-escaped binary data with its own rules...)
    let eucKRStringPercentEscaped = eucKRStringData.map {byte->String in
        if byte >= UInt8(ascii: "A") && byte <= UInt8(ascii: "Z")
        || byte >= UInt8(ascii: "a") && byte <= UInt8(ascii: "z")
        || byte >= UInt8(ascii: "0") && byte <= UInt8(ascii: "9")
        || byte == UInt8(ascii: "_") || byte == UInt8(ascii: ".") || byte == UInt8(ascii: "-")
        {
            return String(Character(UnicodeScalar(UInt32(byte))!))
        } else if byte == UInt8(ascii: " ") {
            return "+"
        } else {
            return String(format: "%%%02X", byte)
        }
    }.joined()
    //In application/x-www-form-urlencoded format, you send data in a URL-query like format.
    let paramString = "formradio1=&formradio2=ems&textarea=\(eucKRStringPercentEscaped)"
    //As all non-ASCII characters are percent-escaped, .isoLatin1 works well here.
    let bodyData = paramString.data(using: .isoLatin1)!
    //Form data needs to be sent as a body of HTTP protocol.
    request.httpBody = bodyData
    //MIME type for usual form data is "application/x-www-form-urlencoded".
    request.addValue("application/x-www-form-urlencoded", forHTTPHeaderField: "Content-Type")
    //URLRequest is ready and you can start dataTask here.
    let task = URLSession.shared.dataTask(with: request) {data, response, error in
        if let error = error {
            print("Error:", error)
        }
        if let response = response {
            print("Response:", response)
        }
        //The response may not be valid EUC-KR; you need to decode it while accepting invalid bytes.
        if let data = data {
            var result = ""
            var i = 0
            while i < data.count{
                let ch = data[i]
                if ch < 0x80 {
                    result += String(Character(UnicodeScalar(UInt32(ch))!))
                } else if
                    i + 2 <= data.count,
                    let ch2 = String(data: data.subdata(in: i..<i+2), encoding: encoding)
                {
                    result += ch2
                    i += 1
                } else {
                    result += "?"
                }
                i += 1
            }
            print("Result:", result)
        }
    }
    //Do not forget to resume the created task.
    task.resume()
    //And remember you should not do anything after you invoke an async task.
}

If your server side can handle UTF-8 requests and responses properly, the code above can be far more simple. 如果您的服务器端可以正确处理UTF-8请求和响应,则上面的代码可以简单得多。 Using EUC-KR in web services is sort of outdated. 在Web服务中使用EUC-KR有点过时了。 You'd better adopt UTF-8 soon. 您最好尽快采用UTF-8。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM