简体   繁体   English

从html文件获取信息

[英]Getting information from a html file

I'm writing a program where I get information from a page and put it in excel file. 我正在编写一个程序,从页面中获取信息并将其放在excel文件中。

The problem is, I don't find a way to search for the tag with the specific info. 问题是,我找不到用特定信息搜索标签的方法。

Here is my code(so far): 这是我的代码(到目前为止):

  private void getAll() throws IOException {

    for (int i = 0;i<250;i++){
        URL vurl = new URL("http://www.bamart.be/nl/artists/detail/" + i);
        BufferedReader reader = new BufferedReader(new InputStreamReader(vurl.openStream()));
        String line;
        while ((line = reader.readLine()) != null){
          if (line.equalsIgnoreCase("<div class=\"subcontent\">"){ 
            System.out.println("Found info!");
          }

            printInfo(line,i);
        }
        }
    }


private void printInfo(String info,int i){
        System.out.println("/***********************************************/");
        System.out.println("************\t" + info + "**********************/");
        System.out.println("/************" +" Artist page:" +  i + " of 999 **********************/" );


    }

The println doesn't come up, but it is in the html file. println不会出现,但是在html文件中。

if (line.equalsIgnoreCase("<div class=\"subcontent\">"){ }

This if statement is checking for exact equality (ignoring case) however there could be other content on that line including whitespace for example. 该if语句正在检查完全相等(忽略大小写),但是该行上可能还有其他内容,例如空格。

What you might want instead would be something like 相反,您可能想要的是

if (line.toLowerCase().contains("<div class=\"subcontent\">") { }

这个例子开始尝试使用Jsoup

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM