简体   繁体   English

使用DOMXpath提取JSON数据

[英]Using DOMXpath to extract JSON data

I have used php simple html dom to no success on this issue. 我已经使用php简单的html dom在此问题上没有成功。 Now I have gone to DOMDocument and DOMXpath and this does seem promising. 现在我去了DOMDocument和DOMXpath,这看起来确实很有希望。

Here is my issue: I am trying to scrape data from a page which is loaded via a web service request after the page initially shows. 这是我的问题:在页面最初显示后,我试图从通过Web服务请求加载的页面中抓取数据。 It is only milliseconds but because of this, normal scraping shows a template value as opposed to the actual data. 仅几毫秒,因此,正常的抓取会显示一个模板值,而不是实际数据。 I have found the endpoint url using chrome developer network settings. 我使用chrome开发人员网络设置找到了端点网址。 So if I enter that url into the browser address bar the data displays nicely in JSON format. 因此,如果我在浏览器地址栏中输入该网址,则数据会以JSON格式很好地显示。 All Good. 都好。

My problem arises because any time the site is re-visited or the page refreshed, the suffix of the endpoint url is randomly-generated so I can't hard-code this url into my php file. 出现我的问题是因为任何时候重新访问该站点或刷新页面时,都会随机生成端点URL的后缀,因此我无法将该URL硬编码到我的php文件中。 For example the end of the url is "? =253648592" on first visit but on refresh it could be "? =375482910". 例如,第一次访问时,URL的末尾是“? = 253648592”,但是在刷新时,URL的末尾可能是“? = 375482910”。 The base of the url is static. 网址的基址是静态的。

Without getting into headless browsers (I tried and MY head hurts!) is there a way to have Xpath find this random url when the page loads? 没有进入无头浏览器(我尝试过,我的头很痛!),有没有办法让Xpath在页面加载时找到此随机URL?

Sorry for being so long-winded but I wanted to explain as best I could. 抱歉,这么长的时间,但我想尽我所能解释。

It's probably much easier and faster to just use a regex if you only need one item/value from the HTML. 如果您只需要HTML中的一项/值,则仅使用正则表达式可能会更容易,更快捷。 I would like to give an example but therefor I would need a more extended snippet of how the HTML looks like that contains the endpoint that you want to fetch. 我想举一个例子,但是为此,我需要一个更扩展的代码片段,以显示HTML包含您要获取的端点的外观。

Is it possible to give a snippet of the HTML that contains the endpoint? 是否可以提供包含端点的HTML的摘要?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM