简体   繁体   English

从目标网址重定向的YQL查询

[英]YQL query redirected from target url

I'm trying to scrap a website but when I try to connect to it using YQL, I get redirected to the homepage of the website instead of the page I'm trying to get content off. 我正在尝试抓取一个网站,但是当我尝试使用YQL连接到该网站时,我被重定向到该网站的主页,而不是我要获取内容的页面。

Do anybody know what I could do to prevent my request being redirected or any solution to avoid this issue ? 有谁知道我该怎么做才能防止我的请求被重定向,或者有任何解决方案来避免此问题?

Here is a like to the request I'm trying to perform and which is failing : Target site : 这类似于我正在尝试执行的请求,但失败了:目标站点:
http://gticket.imagix.be/os1.aspx http://gticket.imagix.be/os1.aspx
Request in Yahoo Console : 在Yahoo Console中的请求:
http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22http%3A%2F%2Fgticket.imagix.be%2Fos1.aspx%22&diagnostics=true http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22http%3A%2F%2Fgticket.imagix.be%2Fos1.aspx%22&diagnostics =真

It's not because of yql , actually it has 302 redirect. 这不是因为yql ,实际上它具有302重定向。 If you directly put this url in the browser's address bar or click it, then you can see that it has been redirected to the home page of the site and you can't prevent it. 如果直接将此 URL放置在浏览器的地址栏中或单击它,则可以看到 URL已被重定向到网站的主页,并且无法阻止它。

This is the yql result of the page after redirection. 是重定向后页面的yql结果。

Update: 更新:

Also remember that if a website chooses to block YQL using the robots.txt directive, you won't be allowed to access it. 还要记住,如果网站选择使用robots.txt指令阻止YQL,则将不允许您访问它。 So a site can reject yql request if it has been setup in that way and here is an article about blocking yql . 因此,如果以这种方式设置了网站,站点可以拒绝yql请求, 这是有关阻止yql 的文章

There is a followRedirects option in YQL which you can use. 您可以使用YQL中的followRedirects选项。 Check here 在这里检查

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM