仅加载一次网址

Question

I have a table of url I want to load, the table can have one or more time an url. 我有一个我要加载的url表，该表可以有一个或多个时间的url。

For example, a table with three values : url1, url2 url1. 例如，一个包含三个值的表：url1，url2 url1。

So, after, I load an url, an extract one of his html piece(for example a ). 所以，之后，我加载了一个url，一个提取他的html片段（例如a）。

I have this : 我有这个：

    HtmlPage page=null;


for (int i = 0; i < tableUrlSource.length; i++) {
try {
                page = webClient.getPage(tabUrlSource[i]);
                List<HtmlElement> nbElements = (List<HtmlElement>) page.getByXPath(tabXpathSource[i]);
                if (null != nbElements && !nbElements.isEmpty()) {
                    htmlResult = nbElements.get(0).asText();

    }

...

But this is not the more efficient, because it will load url1 two times and url one time. 但这并不是更有效，因为它会加载url1两次并且url一次。 So it will like there is three url to load, and then, make the treatment longer. 所以它会加载三个网址，然后让治疗时间更长。

How can I load an url only one time and keep the same final result? 如何只加载一次网址并保持相同的最终结果？

I hope my english is clear, so as my question. 我希望我的英语很清楚，所以我的问题也是如此。

Regards. 问候。

Thank you. 谢谢。

Answer 1

You could use a Set<HtmlElement> instead of a List . 您可以使用Set<HtmlElement>而不是List 。 This will remove duplicates automatically. 这将自动删除重复项。

This of course is dependant on the fact that HtmlElement s are comparable. 这当然取决于HtmlElement的可比性。 If they aren't, you could instead add all the URLs to a Set<String> and then iterate over that. 如果不是，您可以将所有URL添加到Set<String> ，然后迭代它。

Update 更新

To clarify the second part: 澄清第二部分：

A Set is declared like this in the Javadocs: Set在Javadocs中声明如下：

A collection that contains no duplicate elements. 不包含重复元素的集合。 More formally, sets contain no pair of elements e1 and e2 such that e1.equals(e2), and at most one null element. 更正式地说，集合不包含元素对e1和e2，使得e1.equals（e2）和至多一个null元素。 As implied by its name, this interface models the mathematical set abstraction. 正如其名称所暗示的，该界面模拟数学集抽象。

In other words, to ensure that there are no duplicates, it relies on the elements being comparable via the equals() method. 换句话说，为了确保没有重复，它依赖于通过equals()方法可比较的元素。 If HtmlElement hasn't overridden this method, the Set will just use the Object.equals() method, which just compares object references instead of the actual data in the HtmlElements . 如果HtmlElement没有覆盖此方法，则Set将只使用Object.equals()方法，该方法仅比较对象引用而不是HtmlElements中的实际数据。

However, String has overridden the equals() method, and you can therefor be certain that duplicate String s will be removed from a Set<String> . 但是， String已重写equals()方法，因此您可以确定将从Set<String>删除重复的String 。

Answer 2

What Keppil answered is correct but you would have to use the Set in place of tabUrlSource[i] rather than for Set<HtmlElement> Keppil回答的是正确的，但您必须使用Set取代tabUrlSource [i]而不是Set<HtmlElement>

EDIT: Okay what is the content of tabUrlSource[i]?Is it of type URL or custom? 编辑：好吧tabUrlSource [i]的内容是什么？它是URL类型还是自定义类型？ This is how it would look like if it is URL 如果它是URL，它就是这样的

  Set <URL>uniqueURLs = new HashSet <URL>();

  for (int i = 0; i < tableUrlSource.length; i++) { 
  uniqueURLs.add(tableUrlSource[i])
  }

And then iterate over this Set instead of tableUrlSource array like this 然后迭代这个Set而不是像这样的tableUrlSource数组

  for(Iterator itr = uniqueURLs.iterator(); itr.hasNext();  ){
   page = webClient.getPage((URL)itr.next());
    .............
    .............

Continue the rest of the code 继续其余的代码

Also you said you are using index 'i' to associate url and xpath. 另外你说你使用索引'i'来关联url和xpath。 Will that xpath be same for same url? 相同网址的xpath是否相同？ If so you can use HashMap instead with key as URL and value as xpath so that duplicate keys will be overridden. 如果是这样，您可以使用HashMap，而使用key作为URL，将值作为xpath，以便覆盖重复的键。 Then you can iterate over this hashmap keys to get the 'page' and use the 'value' for to fetch HTMLELEMENT 然后，您可以迭代此hashmap键以获取“页面”并使用“value”来获取HTMLELEMENT

If they are not same you can still use a HashSet like this 如果它们不相同，您仍然可以使用这样的HashSet

  Set <URL>uniqueURLs = new HashSet <URL>();
      HtmlPage page=null;


   for (int i = 0; i < tableUrlSource.length; i++) {
   try {    
            if(uniqueURLs.contains(tabUrlSource[i]) continue;
            else
             uniqueURLs.add( tabUrlSource[i] );
            page = webClient.getPage(tabUrlSource[i]);
            List<HtmlElement> nbElements = (List<HtmlElement>) 
            page.getByXPath(tabXpathSource[i]);
            if (null != nbElements && !nbElements.isEmpty()) {
                htmlResult = nbElements.get(0).asText();

}

Hope this helps :) 希望这可以帮助：）

仅加载一次网址

问题描述

2 个解决方案

解决方案1
1 2012-07-18 08:08:05

解决方案2
1 已采纳 2012-07-18 08:19:45

仅加载一次网址

问题描述

2 个解决方案

解决方案1 1 2012-07-18 08:08:05

解决方案2 1 已采纳 2012-07-18 08:19:45

解决方案1
1 2012-07-18 08:08:05

解决方案2
1 已采纳 2012-07-18 08:19:45