[英]YQL query redirected from target url
I'm trying to scrap a website but when I try to connect to it using YQL, I get redirected to the homepage of the website instead of the page I'm trying to get content off. 我正在尝试抓取一个网站,但是当我尝试使用YQL连接到该网站时,我被重定向到该网站的主页,而不是我要获取内容的页面。
Do anybody know what I could do to prevent my request being redirected or any solution to avoid this issue ? 有谁知道我该怎么做才能防止我的请求被重定向,或者有任何解决方案来避免此问题?
Here is a like to the request I'm trying to perform and which is failing : Target site : 这类似于我正在尝试执行的请求,但失败了:目标站点:
http://gticket.imagix.be/os1.aspx http://gticket.imagix.be/os1.aspx
Request in Yahoo Console : 在Yahoo Console中的请求:
http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22http%3A%2F%2Fgticket.imagix.be%2Fos1.aspx%22&diagnostics=true http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22http%3A%2F%2Fgticket.imagix.be%2Fos1.aspx%22&diagnostics =真
It's not because of yql
, actually it has 302
redirect. 这不是因为
yql
,实际上它具有302
重定向。 If you directly put this url in the browser's address bar or click it, then you can see that it has been redirected to the home page of the site and you can't prevent it. 如果直接将此 URL放置在浏览器的地址栏中或单击它,则可以看到该 URL已被重定向到网站的主页,并且无法阻止它。
This is the yql result of the page after redirection. 这是重定向后页面的yql结果。
Update: 更新:
Also remember that if a website chooses to block YQL using the robots.txt directive, you won't be allowed to access it. 还要记住,如果网站选择使用robots.txt指令阻止YQL,则将不允许您访问它。 So a site can reject
yql
request if it has been setup in that way and here is an article about blocking yql
. 因此,如果以这种方式设置了网站,站点可以拒绝
yql
请求, 这是有关阻止yql
的文章 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.