繁体   English   中英

使用Selenium和Beautifulsoup执行javascript后爬网

[英]Crawling web after executing javascript using selenium and Beautifulsoup

我要在执行javascript“点击”事件后抓取网络

网络如下图所示

function initPage() {

initCorpInfo();

var Tree = Ext.tree;

var treeRoot = new Tree.TreeNode({
    text: "total",
    id: "root",
    href: "javascript: viewDoc('20150515001896', '4671059', null, null, null, 'dart3.xsd')"
});

    treeNode2 = new Tree.TreeNode({
        text: "4. financial statement",
        id: "17",
        cls: "text",
        listeners: {
            click: function() {viewDoc('20150515001896', '4671059', '17', '1015699', '132786',


    });
}

function viewDoc(rcpNo, dcmNo, eleId, offset, length, dtd) {

currentDocValues.rcpNo = rcpNo;
currentDocValues.dcmNo = dcmNo;
currentDocValues.eleId = eleId;
currentDocValues.offset = offset;
currentDocValues.length = length;
currentDocValues.dtd = dtd;
var params = "";
params += "?rcpNo=" + rcpNo;
params += "&dcmNo=" + dcmNo;
if (eleId != null)
    params += "&eleId=" + eleId;
if (offset != null)
    params += "&offset=" + offset;
if (length != null)
    params += "&length=" + length;
params += "&dtd=" + dtd;
document.getElementById("ifrm").src = "/report/viewer.do" + params;

}

查看源代码: http ://dart.fss.or.kr/dsaf001/main.do?rcpNo=20150515001896(单击左侧栏中的4.재무제표)

我可以使用硒和beautifulsoup执行“ click:function(){viewDoc('20150515001896','4671059','17','1015699','132786'”吗?

我应该使用scrapy而不是Beautifulsoup来实现javascript的功能吗?

它只是通过这种方式解决了。

from selenium import webdriver
from bs4 import BeautifulSoup
import time

browser=webdriver.Firefox()
browser.get("http://dart.fss.or.kr/dsaf001/main.do?rcpNo=20150515001896")

soup=BeautifulSoup(browser.page_source) 
browser.execute_script("viewDoc('20150515001896', '4671059', '17', '1015699', '132786', 'dart3.xsd');")

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM