In this Java web application project I'm first, trying to read the content of a page with getUrlContentString()
method (seem to be working) and second, only display the content between tags using the method proccessString ()
. The second method does not seem to be responding as expected and it returns a blank page. What is causing the problem?
index.jsp
<%@page contentType="text/html" pageEncoding="UTF-8"%>
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>JSP Page</title>
</head>
<body>
<%= cookiePac.CookieJar.getUrlContentString("http://help.websiteos.com/"
+ "websiteos/example_of_a_simple_html_page.htm")%>
<p>
<%= cookiePac.CookieJar.proccessString()%>
</p>
</body>
</html>
CookieJar.java
package cookiePac;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class CookieJar {
private final List<String> cookies;
private static String rawCookiesString = "";
private static String rawCookiesString_1 = "";
public CookieJar () {
this.cookies = new ArrayList<>();
}
/* read the page, store into rawCookiesString */
public static String getUrlContentString (String theUrl) {
StringBuilder content = new StringBuilder();
try {
URL url = new URL(theUrl);
URLConnection urlConnection = url.openConnection();
BufferedReader bufferedReader = new BufferedReader(
new InputStreamReader(urlConnection.getInputStream()));
String line;
while ((line = bufferedReader.readLine()) != null) {
content.append(line + "\n");
}
bufferedReader.close();
} catch (Exception e) {
e.printStackTrace();
}
rawCookiesString = content.toString();
return " ";
}
/* select the content between <a> */
public static String proccessString () {
Pattern p = Pattern.compile("<a>(.*?)</a>");
Matcher m = p.matcher(rawCookiesString);
if (m.find()) {
rawCookiesString_1 = m.group(1);
}
return rawCookiesString_1.toString();
}
}
I've created a project with your code. I saw some problems there. Here they are.
First of all, a static html that you get with the url you've specified - not the one you see in your browser console window, but the one without scripts being executed - does not contain anchor tags. That's why you cannot get any content of this tag. Take, for example, this URL: http://www.cssdesignawards.com/ - instead of yours http://help.websiteos.com/websiteos/example_of_a_simple_html_page.htm .
Secondly, you're trying to match a tag in this fashion: "<a>(.*?)</a>"
. But in fact it's very hard to match any anchor tag content with this regex, because usually CSS classes are used, so the way that increases chances to match anchor content is to use "<a(.*?)</a>"
instead of "<a>(.*?)</a>"
.
getUrlContentString
method is named to return html as a string, but it always returns just a blank string. Consider renaming this method or returning rawCookiesString
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.