简体   繁体   English

url 数据任务在使用 SwiftSoup 解析时未显示正确的内容? Swift 5

[英]url data task is not showing the right content when parsed with SwiftSoup? Swift 5

I am pretty new to swift and have an app that performs a simple url data task to parse the html contents of that website.我是 swift 的新手,并且有一个应用程序执行简单的 url 数据任务来解析该网站的 html 内容。 I was trying to load certain elements but wasn't getting the content that I was seeing on the website when I inspect it manually.我试图加载某些元素,但没有得到我在手动检查时在网站上看到的内容。 I don't really know what the problem.我真的不知道是什么问题。

I guess my question is;我想我的问题是; is there a way to load content as it would come up if I manually searched this website?如果我手动搜索该网站,是否有加载内容的方法?

Here is the relevant code:这是相关代码:

import SwiftSoup

let config = URLSessionConfiguration.default
config.httpAdditionalHeaders = ["User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"]
        
let session = URLSession(configuration: config)
        
let url = URL(string: link)

let task = session.dataTask(with: url!) { [self] (data, response, error) in            
    do {
        let htmlContent = NSString(data: data!, encoding: String.Encoding.utf8.rawValue)
        let doc: Document = try SwiftSoup.parse(htmlContent! as String)

        let elements = try doc.getAllElements().array()                    
                    
    } catch Exception.Error(type: let type, Message: let message) {
        print(type)
        print(message)
    } catch {
        print("error")
    }
                
}
            

Please let me know if there is any way to do this, even if it involves using a different package to parse the data.请让我知道是否有任何方法可以做到这一点,即使它涉及使用不同的 package 来解析数据。 It is very important for my app.这对我的应用程序非常重要。 I would highly appreciate any help possible!我将不胜感激任何可能的帮助!

Thanks.谢谢。

I suspect the issue may be your user agent that is being sent to the website whose response you are parsing.我怀疑问题可能是您的用户代理被发送到您正在解析其响应的网站。

The user agent is a string that is sent with the request to the url (as an additional header).用户代理是一个字符串,随请求一起发送到 url(作为附加标头)。 It identifies what sort of thing you are so that an appropriate response can be sent.它可以识别您的身份,以便发送适当的回复。

For example, if you are requesting from Safari on Mac on Big Sur the user agent might be:例如,如果您在 Big Sur 上的 Mac 上从 8837867565588 请求用户代理可能是:

"Mozilla/5.0 (Macintosh; Intel Mac OS X 11_5_2) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.2 Safari/605.1.15" “Mozilla/5.0 (Macintosh; Intel Mac OS X 11_5_2) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.2 Safari/605.1.15”

Whereas from iPad it might be:而从 iPad 开始,它可能是:

"Mozilla/5.0 (iPad; CPU OS 14_7_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.2 Mobile/15E148 Safari/604.1" “Mozilla/5.0(iPad;CPU 操作系统 14_7_1,如 Mac OS X)AppleWebKit/605.1.15(KHTML,如 Gecko)版本/14.1.2 Mobile/15E148 Safari/604.1”

The site serving the request uses the user agent to determine what kind of response to return and what features to include (full site, mobile site, text site, etc).为请求提供服务的站点使用用户代理来确定要返回哪种响应以及要包含哪些功能(完整站点、移动站点、文本站点等)。

For a URLSession in a Swift app, the user agent is the app's bundle name.对于 Swift 应用程序中的 URLSession,用户代理是应用程序的包名称。 So the site may be getting confused by that and returning something different than you see when you visit it in a browser.因此,网站可能会对此感到困惑,并返回与您在浏览器中访问时看到的不同的内容。

Some options:一些选项:

Explore the site, it might have a better url to use to get the info you are after.浏览该站点,它可能有更好的 url 可用于获取您想要的信息。

Change the user-agent string your are sending.更改您发送的用户代理字符串。 The basic steps are:基本步骤是:

let config = URLSessionConfiguration.default
config.httpAdditionalHeaders = ["User-Agent": "User-Agent String Here"]
let session = URLSession(configuration: config)

You may need to adapt your use of the shared session to support this (eg: either create a session with your config and use that, as above, or check if there is a way to override the header for your request using the shared session).您可能需要调整您对共享 session 的使用以支持此功能(例如:使用您的配置创建一个 session 并如上所述使用它,或者检查是否有一种方法可以使用共享会话为您的请求覆盖 header) .

I found a solution that works for me.我找到了适合我的解决方案。 Here is the relevant code:这是相关代码:

private let webView: WKWebView = {
    let prefs = WKPreferences()
    prefs.javaScriptEnabled = true
    let config = WKWebViewConfiguration()
    config.preferences = prefs
    let webView = WKWebView(frame: .zero, configuration: config)
    return webView
}()

override func viewDidLoad() {
    super.viewDidLoad()
      
    view.addSubview(webView)
    webView.navigationDelegate = self
 
}

func webView(_ webView: WKWebView, didFinish navigation: WKNavigation!) {
    parseData()        
}


func parseData() {
        
    DispatchQueue.main.asyncAfter(deadline: .now() + 5.0) { [unowned self] in

        webView.evaluateJavaScript("document.body.innerHTML") { result, error in
            guard let htmlContent = result, error == nil else {
                print("error")
                return
           }                
                
           do {
               let doc = try SwiftSoup.parse(htmlContent as! String)
               var allProducts = try doc.getAllElements.array()
           } catch {
               print("error")
           }
                
       }
  
   }   
        
}

Using a WebView to load the website first, then parse the data after a delay is a working solution for me.使用 WebView 首先加载网站,然后在延迟后解析数据对我来说是一个可行的解决方案。 It might not be the best idea to have a fixed delay, so if any has any other suggestion it would be highly appreciated!固定延迟可能不是最好的主意,因此如果有任何其他建议,我们将不胜感激!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM