I want to read a web page with multiple page, for example: page=1 until 100
import org.htmlcleaner.*;
...
url = http://www.webpage.com/search?id=10&page=1
for (int j = 1; j <= 100; j++) {
WebParse thp = new WebParse(new URL(url+j));
Sometimes I get the following error:
java.io.FileNotFoundException: http://www.webpage.com/search?id=10&page=18
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
at java.net.URL.openStream(Unknown Source)
at org.htmlcleaner.Utils.readUrl(Utils.java:63)
at org.htmlcleaner.HtmlCleaner.clean(HtmlCleaner.java:373)
at org.htmlcleaner.HtmlCleaner.clean(HtmlCleaner.java:387)
at <mypackage>.WebParse.<init>(WebParse.java:21)
at <mypackage>.WebParse.runThis(WebParse.java:54)
at <mypackage>.WebParse.main(WebParse.java:43)
I think this issue is caused by my network connection as when I try refreshing (rerunning) sometimes its works well.
How can I make it automatically try rerunning when this error occurs.
why don't you add some attempts and a little delay between them?
for (int j = 1; j <= 100; j++) {
int maxretries = 3;
int attempts = 0;
boolean success = false;
while (attempts < maxretries && !success) {
attempts++;
try {
WebParse thp = new WebParse(new URL(url + j));
success = true;
} catch (FileNotFoundException e) {
e.printStackTrace();
try {
Thread.sleep(1000); // play nice
} catch (InterruptedException e1) {
e1.printStackTrace();
}
}
}
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.