简体   繁体   English

如何解析具有更新的 DOM 的 html? Swift

[英]How to parse html that has the updated DOM? Swift

I am fairly knew to coding and am parsing html data from a webiste.我相当了解编码并正在解析来自网站的 html 数据。 The problem is that the elements that I can manually inspect when I view the website are very different from the source code.问题是我在查看网站时可以手动检查的元素与源代码有很大不同。 I understand this is because 'inspecting elements' show the state of the DOM tree after the browser has applied its error correction and after any Javascript have manipulated the DOM.我知道这是因为“检查元素”在浏览器应用错误更正后以及在任何 Javascript 操纵了 DOM 之后显示了 DOM 树的 state。

Here is the relevant code:这是相关代码:

import SwiftSoup

        
let url = URL(string: link)

let task = URLSession.shared.dataTask(with: url!) { [self] (data, response, error) in            
    do {
        let htmlContent = NSString(data: data!, encoding: String.Encoding.utf8.rawValue)
        let doc: Document = try SwiftSoup.parse(htmlContent! as String)

        let elements = try doc.getAllElements().array()                    
                    
    } catch Exception.Error(type: let type, Message: let message) {
        print(type)
        print(message)
    } catch {
        print("error")
    }
                
}

My question is;我的问题是; what can I do to parse the elements of the websites that appear when I inspect them manually?我可以做些什么来解析当我手动检查时出现的网站元素? Sorry if this is a beginner question.对不起,如果这是一个初学者问题。

As you noticed the webpage after being loaded in a browser is different when you request the page in code.正如您所注意到的,当您在代码中请求页面时,在浏览器中加载后的网页是不同的。 That is because some web pages will load data or other html 'lazily' when it is needed to improve performance.这是因为一些 web 页面会在需要提高性能时“延迟”加载数据或其他 html 页面。

To get this html in code, you need to analyze the.network 'XHR' tab in the developer tools of your browser.要在代码中获取此 html,您需要在浏览器的开发人员工具中分析 .network 的“XHR”选项卡。 You should be able to find the missing html there.您应该能够在那里找到丢失的 html。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM