[英]urllib2.urlopen(url).read() fails to read the URL content
I am trying to read the web content of the link: http://www.quikr.com/Mobile-Phones/y149
using following python command: 我正在尝试使用以下python命令阅读链接的Web内容:
http://www.quikr.com/Mobile-Phones/y149
://www.quikr.com/Mobile-Phones/y149:
import requests
import urllib2
hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11'}
url = 'http://www.quikr.com/Mobile-Phones/y149'
req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req).read()
print page
gives the following output: print page
提供以下输出:
<!DOCTYPE html>
<head>
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
<meta http-equiv="cache-control" content="max-age=0" />
<meta http-equiv="cache-control" content="no-cache" />
<meta http-equiv="expires" content="0" />
<meta http-equiv="expires" content="Tue, 01 Jan 1980 1:00:00 GMT" />
<meta http-equiv="pragma" content="no-cache" />
<meta http-equiv="refresh" content="10; url=/distil_r_captcha.html?Ref=/Mobile-Phones/y149&distil_RID=97C53AFC-AA02-11E5-B76A-8C12C4D2AB6C&distil_TID=20151224055301" />
<script type="text/javascript">
(function(window){
try {
if (typeof sessionStorage !== 'undefined'){
sessionStorage.setItem('distil_referrer', document.referrer);
}
} catch (e){}
})(window);
</script>
<script type="text/javascript" src="/QkrDIV1cexsvzwdadarecara.js" defer></script><style type="text/css">#d__fFH{position:absolute;top:-5000px;left:-5000px}#d__fF{font-family:serif;font-size:200px;visibility:hidden}#qttwcrxueetv{display:none!important}</style></head>
<body>
<div id="distil_ident_block"> </div>
</body>
</html>
Is there any workaround to get the actual url content to be read. 是否有任何解决方法来获取要读取的实际URL内容。 Any help is appreciated.
任何帮助表示赞赏。 Thanks in advance!!
提前致谢!!
One option would be to automate a real browser via selenium
. 一种选择是通过
selenium
自动化真正的浏览器。 Working sample: 工作样本:
from selenium import webdriver
driver = webdriver.Firefox()
driver.get("http://www.quikr.com/Mobile-Phones/y149")
for phone in driver.find_elements_by_css_selector(".snb_entire_ad"):
link = phone.find_element_by_css_selector("a.adttllnk")
print link.text
driver.close()
If you want to get the page source, use .page_source
(before closing the driver of course): 如果要获取页面源,请使用
.page_source
(当然在关闭驱动程序之前):
print(driver.page_source)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.