简体   繁体   中英

Problem decode string from xml file in Swift

I have an xml file with content look like below :

<?xml version="1.0" encoding="utf-8"?>
<resources>
    <string-array name="array_ngontay_kq">
        <item>Bạn là người đáng tin cậy.</item>
        <item>Bạn là người có óc xét đoán. </item>
    </string-array>
</resources>

And I used HTMLReader to get this string-array but my output look like this:

Bạn là ngÆ°á»i Äáng tin cậy.
Bạn là ngÆ°á»i có óc xét Äoán.

Here is my code :

let fileURL = Bundle.main.url(forResource: "BoiTay", withExtension: "xml")
        let xmlData = try! Data(contentsOf: fileURL!)
        let topic = "array_ngontay_kq"
        let document = HTMLDocument(data: xmlData, contentTypeHeader: "text/xml")
        for item in document.nodes(matchingSelector: "string-array[name='\(topic)'] item") {
                print(item.textContent)
            }

Is there anyway to fix this or any other solution can do this without using HTMLReader. Sorry Im newer in XMLParse and I couldn't find any answer or tutorial about this type of xml file in Swift.

First of all, you should better check if your BoiTay.xml is really in UTF-8. I'm not familiar with Vietnamese encodings, but some tools may generate XMLs with other encodings than UTF-8, even if the xml header states encoding="utf-8" .

The result seems to be an encoding issue, rather than a bug of the library or your code.

Please show hex dump of your xmlData including the first item element.

print(xmlData as NSData)

Maybe the first 256 bytes would be enough.


By the way, using XMLParser is not so difficult. (Though it is not super-easy.)

Here is an example you can test in the Playground.

import Foundation

class ResoucesParsingDelegate: NSObject, XMLParserDelegate {
    //Property to keey all `string-array`s by name
    var stringArrays: [String: [String]] = [:]

    var nameParsing: String? = nil
    var stringArrayParsing: [String]? = nil

    var currentText: String? = nil

    func parserDidStartDocument(_ parser: XMLParser) {
        print(#function)
    }

    func parserDidEndDocument(_ parser: XMLParser) {
        print(#function)
    }

    func parser(_ parser: XMLParser, parseErrorOccurred parseError: Error) {
        print(#function, parseError)
    }

    func parser(_ parser: XMLParser, didStartElement elementName: String, namespaceURI: String?, qualifiedName qName: String?, attributes attributeDict: [String : String] = [:]) {
        switch elementName {
        case "string-array":
            guard let name = attributeDict["name"] else {
                print("`string-array` element needs `name` attribute")
                return
            }
            //When you find `<string-array name="...">`, prepare a string array to keep items with its name
            nameParsing = name
            stringArrayParsing = []
        case "item":
            if stringArrayParsing == nil {
                print("invalid `item` element")
                return
            }
            //When you find `<item>`, prepare a string to keep the content text of the element
            currentText = ""
        //Prodess other elements
        //...
        default:
            print("Unknown element `\(elementName)`, ignored")
        }
    }

    func parser(_ parser: XMLParser, didEndElement elementName: String, namespaceURI: String?, qualifiedName qName: String?) {
        switch elementName {
        case "string-array":
            if stringArrayParsing == nil || nameParsing == nil {
                print("invalid end tag `string-array`")
                return
            }
            //When you find `</string-array>`, add the current string array to `stringArrays` with its name
            stringArrays[nameParsing!] = stringArrayParsing!
            //Clear string array for next use
            stringArrayParsing = nil
        case "item":
            if stringArrayParsing == nil || currentText == nil {
                print("invalid end tag `item`")
                return
            }
            //When you find `</item>` add the content text to `stringArrayParsing`
            stringArrayParsing!.append(currentText!)
            //Clear content text for next use
            currentText = nil
        //Prodess other elements
        //...
        default:
            print("Unknown element `\(elementName)`, ignored")
        }
    }

    func parser(_ parser: XMLParser, foundCharacters string: String) {
        if currentText == nil {
            //Silently igonore characters while content string is not ready
            return
        }
        currentText! += string
    }
}

let xmlText = """
<?xml version="1.0" encoding="utf-8"?>
<resources>
    <string-array name="array_ngontay_kq">
        <item>Bạn là người đáng tin cậy.</item>
        <item>Bạn là người có óc xét đoán. </item>
    </string-array>
</resources>
"""

let xmlData = xmlText.data(using: .utf8)!

print(xmlData, xmlData as NSData)

let parser = XMLParser(data: xmlData)
let resoucesParsingDelegate = ResoucesParsingDelegate()
parser.delegate = resoucesParsingDelegate
parser.parse()

print(resoucesParsingDelegate.stringArrays)

Output:

 246 bytes <3c3f786d 6c207665 7273696f 6e3d2231 2e302220 656e636f 64696e67 3d227574 662d3822 3f3e0a3c 7265736f 75726365 733e0a20 2020203c 73747269 6e672d61 72726179 206e616d 653d2261 72726179 5f6e676f 6e746179 5f6b7122 3e0a2020 20202020 20203c69 74656d3e 42e1baa1 6e206cc3 a0206e67 c6b0e1bb 9d6920c4 91c3a16e 67207469 6e2063e1 baad792e 3c2f6974 656d3e0a 20202020 20202020 3c697465 6d3e42e1 baa16e20 6cc3a020 6e67c6b0 e1bb9d69 2063c3b3 20c3b363 2078c3a9 7420c491 6fc3a16e 2e203c2f 6974656d 3e0a2020 20203c2f 73747269 6e672d61 72726179 3e0a3c2f 7265736f 75726365 733e> parserDidStartDocument Unknown element `resources`, ignored Unknown element `resources`, ignored parserDidEndDocument ["array_ngontay_kq": ["Bạn là người đáng tin cậy.", "Bạn là người có óc xét đoán. "]] 

If you test this code with the contents of your BoiTay.xml and get the similar result as HTMLReader , the problem definitely is an issue of encoding.

(You may need to modify this code if your actual xml is more complex than the example.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM