简体   繁体   English

在DIV HTML中遍历或查找元素的最快方法

[英]Fastest way to traverse or find elements in DIV HTML

I am writing an utility which should hit the URL of a dynamic page, retrieve the content, search for a specific div tag in various nested div tags and grab the content. 我正在编写一个实用程序,该实用程序应访问动态页面的URL,检索内容,在各种嵌套的div标签中搜索特定的div标签并获取内容。

Mainly, I am looking for some Java code/library. 我主要是在寻找一些Java代码/库。 JavaScript or some JavaScript-based library would also work for me. JavaScript或某些基于JavaScript的库也适用于我。

I shortlisted following -> JSoup, Jerry, JTidy(last updated in 2009-12-01). 我从以下列表中入围-> JSoup,Jerry,JTidy(最新更新于2009-12-01)。 Which one is best performance wise? 哪个是最佳性能明智的?

Edit : Rephrased the question. 编辑 :改写问题。 Added shortlisted lib. 添加了入围的lib。

If you want to scrape a page and parse it I recommend using node with jsdom. 如果您想抓取页面并进行解析,我建议使用带有jsdom的node。

install nodeJS (assuming linux): 安装nodeJS(假设Linux):

sudo apt-get install git
cd ~
git clone git://github.com/joyent/node
cd node
git checkout v0.6
mkdir ~/.local # If it doesn't already exist
./configure --prefix=~/.local
make
make install

There is also a windows installer: http://nodejs.org/dist/v0.6.6/node-v0.6.6.msi 还有一个Windows安装程序: http : //nodejs.org/dist/v0.6.6/node-v0.6.6.msi

install jsdom: 安装jsdom:

$ npm install jsdom

Run this script modified with your url and the relevant selectors: 运行使用您的网址和相关选择器修改的脚本:

var jsdom = require('jsdom');

jsdom.env({
    html: 'url',
    done: function(errors, window) {
        console.log(window.document.getElementById('foo').textContent;
    }
});

If you like jQuery's simple syntax, you can try Jerry : 如果您喜欢jQuery的简单语法,可以尝试Jerry

Jerry is a jQuery in Java. Jerry is a fast and concise Java Library that simplifies HTML document parsing, traversing and manipulating.
Jerry is designed to change the way that you parse HTML content.

Syntax seems to be very simple. 语法似乎很简单。 It should solve your problem in maximum 3 lines of code. 它最多可以用3行代码解决您的问题。

http://jtidy.sourceforge.net/ http://jtidy.sourceforge.net/

JTidy is pretty good at parsing the DOM. JTidy非常擅长解析DOM。

If what you're after is a selector engine, then Sizzle is your best bet. 如果您追求的是选择器引擎,那么Sizzle是您最好的选择。 Its the engine used by jQuery. 它是jQuery使用的引擎。

给出每个div的唯一ID,并使用document.getElementById(id)获取

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM