简体   繁体   English

如何使用用于动态页面的jsoup和jericho api从网页中读取特定内容

[英]How to read the specific content from the webpage using the jsoup and jericho api for dynamic page

我现在使用jsoup api从网页中读取内容,但是它正在读取所有内容,但是我不希望所有内容我想要动态给定URL的特定内容,我也尝试了Jericho api,但它没有解决我的问题。

lets take this example 让我们举这个例子

Document doc = Jsoup.connect("http://www.url.com").get();
Elements elem = doc.select("span.content");
System.out.println(elem.get(1).text());
System.out.println(elem.get(2).text());

if the url have this data 如果网址包含此数据

<html>
<body>

<span class="content">data one</span>

<span class="content">data two</span>

<a class="content">data three</a>

</body>
</html>

Now you'll get these 1st and 2nd elements only 现在,您将仅获得这些第一和第二个元素

<span class="content">data one</span>

<span class="content">data two</span>

UPDATE 更新

//this is the help i can do to you bro
$search="data two";
$re = "/(.*)($search)(.*)/i";

//for example the doc object is having this html elements
$str = '<span class="content">data one</span>
<span class="content">data two</span>
<span class="content">data two</span>
<a class="content">data three</a>';

preg_match_all($re, $str, $matches);
print_r($matches[0]);

OUTPUT 输出值

Array
(
    [0] =>     <span class="content">data two</span>
    [1] =>     <span class="content">data two</span>
)

DEMO 演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM