简体   繁体   English

网页抓取分组

[英]web scraping groupon

i want scrap groupon.com now my problem is such sites when you load for the first time asks you to join their email service but when you reload the page they directly show you the content of the page. 我现在要报废groupon.com,我的问题是这些网站在您首次加载时要求您加入其电子邮件服务,但是当您重新加载页面时,它们会直接向您显示页面内容。 how do i do it? 我该怎么做? i am using php for my scripting. 我使用php编写脚本。

also if anyone could suggest a framework or library in php which makes scraping easy it would be great. 如果有人可以在php中建议一个框架或库,这会使抓取变得容易,那就太好了。

thanks 谢谢

I would investigate the cURL library for grabbing website content. 我将调查cURL库以获取网站内容。 I'm not sure on the exact information you want to scrape, or if the refresh will cause an issue, but hopefully this launches your attempt. 我不确定您要抓取的确切信息,还是不确定刷新是否会引起问题,但是希望这能启动您的尝试。

Must you stick with PHP for the scraping? 您必须坚持使用PHP进行抓取吗? TestPlan makes this type of testing easy. TestPlan使这种测试变得容易。 You can either access the page again, or simply use TestPlan to sign up for their email list to gain extended access to their site. 您可以再次访问该页面,也可以简单地使用TestPlan注册他们的电子邮件列表,以获得对他们站点的扩展访问。

Here's a rough example that takes you to the main page and closes the little popup: 这是一个粗糙的示例,它带您进入主页并关闭小弹出窗口:

GotoURL http://www.groupon.com/
Click id:step_one

SubmitForm with
    %Params:subscription[email_address]% somewhere@test.domain.xx
end

Click id:close

如果有帮助,他们可以使用API http://www.groupon.com/pages/api

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM