简体   繁体   中英

Removing everything between a certain set of characters with Swift

I'm quite new to Swift and native programming, and for a small project I'm doing for myself I'm getting in the full html after doing a twitter search, and I'm trying to filter out just the text of the first tweet. I'm up to the point were I'm able to get the first tweet, including all the tags that are in there, but I'm a bit clueless on how to filter just the text out of there and remove the HTML elements.

For example, it's pretty easy to take a single tweet and filter out the possible <a href=""> and <span> etc. But when I'd change the tweet or search, it wouldnt work as specific. The thing I'm looking for really is on how to remove everything in a string that starts with < and ends with >. This way I'm able to filter out all the stuff I don't need in my string. I'm using "string.componentsSeparatedByString()" to grab the one tweet I need out of all the HTML, but I can't use this method to filter all the stuff out of my string.

Please bear with me since I'm quite new at this, I'm aware that I'm possibly not even doing this right at all and there's a way easier method to pull a single tweet instead of all this hassle. If so, please let me know as well.

You can create a function to do it for you as follow:

func html2String(html:String) -> String {
    return NSAttributedString(data: html.dataUsingEncoding(NSUTF8StringEncoding)!, options:[NSDocumentTypeDocumentAttribute:NSHTMLTextDocumentType,NSCharacterEncodingDocumentAttribute:NSUTF8StringEncoding], documentAttributes: nil, error: nil)!.string
}

or as an extension:

extension String {
    var html2String:String {
        return NSAttributedString(data: dataUsingEncoding(NSUTF8StringEncoding)!, options: [NSDocumentTypeDocumentAttribute:NSHTMLTextDocumentType,NSCharacterEncodingDocumentAttribute:NSUTF8StringEncoding], documentAttributes: nil, error: nil)!.string
    }
    var html2NSAttributedString:NSAttributedString {
        return NSAttributedString(data: dataUsingEncoding(NSUTF8StringEncoding)!, options: [NSDocumentTypeDocumentAttribute:NSHTMLTextDocumentType,NSCharacterEncodingDocumentAttribute:NSUTF8StringEncoding], documentAttributes: nil, error: nil)!
    }
}

you might prefer as a NSData extension

extension NSData{
    var htmlString:String {
        return  NSAttributedString(data: self, options: [NSDocumentTypeDocumentAttribute:NSHTMLTextDocumentType,NSCharacterEncodingDocumentAttribute:NSUTF8StringEncoding], documentAttributes: nil, error: nil)!.string
    }
}

or NSData as a function:

func html2String(html:NSData)-> String {
    return  NSAttributedString(data: html, options: [NSDocumentTypeDocumentAttribute:NSHTMLTextDocumentType,NSCharacterEncodingDocumentAttribute:NSUTF8StringEncoding], documentAttributes: nil, error: nil)!.string
}

Usage:

"<div>Testing<br></div><a href=\"http://stackoverflow.com/questions/27661722/removing-everything-between-a-certain-set-of-characters-with-swift/27662573#27662573\"><span>&nbsp;Hello World !!!</span>".html2String  //  "Testing\n Hello World !!!"

let result = html2String("<div>Testing<br></div><a href=\"http://stackoverflow.com/questions/27661722/removing-everything-between-a-certain-set-of-characters-with-swift/27662573#27662573\"><span>&nbsp;Hello World !!!</span>")  //  "Testing\n Hello World !!!"

// lets load this html as String

import UIKit

class ViewController: UIViewController {
    let questionLink = "http://stackoverflow.com/questions/27661722/removing-everything-between-a-certain-set-of-characters-with-swift/27662573#27662573"
    override func viewDidLoad() {
        super.viewDidLoad()
        // Do any additional setup after loading the view, typically from a nib.
        if let questionUrl = NSURL(string: questionLink) {
            println("LOADING URL")
            if let myHtmlDataFromUrl = NSData(contentsOfURL: questionUrl){
                println(myHtmlDataFromUrl.htmlString)
            }
        }
    }
    override func didReceiveMemoryWarning() {
        super.didReceiveMemoryWarning()
        // Dispose of any resources that can be recreated.
    }
}

Quite a lot of values have changed in Swift over the last few years, so I just wanted to post an updated version of Leo Dabus' answer, updated to current Swift syntax.

extension String {

    func removeHTMLEncoding() throws -> String? {
        guard let data = self.data(using: .utf8) else { return nil }
        let attr = try NSAttributedString(
            data: data,
            options: [
                .documentType: NSAttributedString.DocumentType.html,
                .characterEncoding: NSNumber(value: String.Encoding.utf8.rawValue)
            ],
            documentAttributes: nil
        )
        return attr.string
    }

}

Kinda annoying that you still need to convert the string encoding value to an NSNumber - NSAttributedString is pretty out of date!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM