简体   繁体   中英

How to remove white spaces between paragraph in Jsoup output paragraphs?

Here is my code. When output printing that print white space between paragraphs also. How can I remove white spaces between paragraphs and then I want to store sentence by sentence in array list.

    public static void main(String[] args) {

    try {
          String url = "http://www.divaina.com/";

          System.setProperty("http.proxyHost", "cache.mrt.ac.lk");
          System.setProperty("http.proxyPort", "3128");

          Document doc = Jsoup.connect(url).timeout(10000).get();

          Elements paragraphs = doc.select("p");
          for(Element p : paragraphs){
            System.out.println(p.text());}
                } 
        catch (IOException ex) {
            ex.printStackTrace();
           }


}

When I'm directly adding content into database white spaces also adding it. How can I remove those white spaces between paragraphs? Actually I want to read content of web page and line by line adding to the database. Is there any other proper way to do it?

屏幕截图出来了

Obviously some of paragraphs contain no text. This might help:

for (Element p : paragraphs) 
{
    if (p.text().length() != 0)
    System.out.println(p.text());
}

Use regular expression:

String withoutspace = whitespace.replaceAll("\\s", "");

Or try this

String withoutSpace = whitespace.replace("\n", "").replace("\r", "");

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM