简体   繁体   English

如何解析网页中的JavaScript?

[英]How to parse JavaScript in webpage?

I am trying to parse one webpage by using Python 2.7 and I want to read entire HTML code. 我正在尝试使用Python 2.7解析一个网页,并且我想阅读整个HTML代码。 But result is like this ... 但是结果是这样的...

<html><head><script type="text/javascript">
location.replace( "http://captcha.search.daum.net/captcha/show?url=http%3A%2F%2Fsearch.daum.net%2Fsearch%3Fw%3Dnews%26nil_search%3Dbtn%26DA%3DNTB%26enc%3Dutf8%26cluster%3Dy%26cluster_page%3D1%26q%3D%25EB%25B3%25B4%25EA%25B3%25A0%25EC%2584%259C" );
</script>
</head></html>

I think this webpage is using JavaScript. 我认为该网页正在使用JavaScript。 How can I parse entire HTML code contained in JavaScript? 如何解析JavaScript中包含的整个HTML代码?

My python code is this ... 我的python代码是这个...

#-*- coding: utf-8 -*-

import urllib2
from bs4 import BeautifulSoup

url = "http://search.daum.net/search?w=news&nil_search=btn&DA=NTB&enc=utf8&cluster=y&cluster_page=1&q=%EB%B3%B4%EA%B3%A0%EC%84%9C"
page = urllib2.urlopen(url)
soup = BeautifulSoup(page.read())

print soup

It seems some headers are required for this page to be shown properly. 似乎需要某些标题才能正确显示此页面。

Try adding page headers from your request to your soup command, sending the same parameters as your browser send to get the result u see in the browser 尝试将请求中的页面标题添加到汤命令中,发送与浏览器发送的参数相同的参数,以获取在浏览器中看到的结果

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM