简体   繁体   English

使用Java从HTML提取数据

[英]Extract data from HTML using java

I want to extract data HTML using Java. 我想使用Java提取数据HTML。 I tried using Jsoup but so far I'm unable to extract the correct data. 我尝试使用Jsoup,但到目前为止,我无法提取正确的数据。 Here is the HTML code snippet from which I'm trying to extract the data. 这是我尝试从中提取数据的HTML代码段。

<a href="javascript:;" id="listen_880966" onclick="MP3PREVIEWPLAYER.showHiddePlayer(880966, 'http://mksh.free.fr/' + 'lol/mp3/Paint_It_Black/18_the_black_dahlia_murder_-_paint_it_black_(rolling_stones)-bfhmp3.mp3')" title="Listen Paint it Black    The Black Dahlia Murder   Great Metal Covers 36" class="button button-s button-1 listen "   >

I want the link (" http://mksh.free.fr/ ' + 'lol/mp3/Paint_It_Black/18_the_black_dahlia_murder_-_paint_it_black_(rolling_stones)-bfhmp3.mp3") and the title to be extracted into different variables. 我想要链接(“ http://mksh.free.fr/'+'lol / mp3 / Paint_It_Black / 18_the_black_dahlia_murder _-_ paint_it_black_(rolling_stones)-bfhmp3.mp3“)和标题要提取到不同的变量中。 It would be really helpful if a sample code is provided along with the answer. 如果提供示例代码和答案,那将非常有帮助。

You can use Regular Expressions to parse out the section you want. 您可以使用正则表达式来解析所需的部分。 Then you can use something like string.split(delimiter) to extract out the specific info. 然后,您可以使用诸如string.split(delimiter)之类的东西来提取特定信息。 See this link for info on the string.split() method 有关string.split()方法的信息,请参见此链接

import java.util.regex.*;
import java.lang.*;

class Main
{
    public static void main (String[] args) throws java.lang.Exception
    {
            String mydata = "<a href=\"javascript:;\" id=\"listen_880966\" onclick=\"MP3PREVIEWPLAYER.showHiddePlayer(880966, 'http://mksh.free.fr/' + 'lol/mp3/Paint_It_Black/18_the_black_dahlia_murder_-_paint_it_black_(rolling_stones)-bfhmp3.mp3')\" title=\"Listen Paint it Black    The Black Dahlia Murder   Great Metal Covers 36\" class=\"button button-s button-1 listen \"   >";
            Pattern pattern = Pattern.compile("'http://mksh.free.fr/'\\s.\\s'[\\(\\).A-Za-z0-9/_-]+'");
            Pattern title = Pattern.compile("title=\\\"[A-Za-z0-9\\s]+\\\"");
            Matcher matcher = pattern.matcher(mydata);
            if (matcher.find())
            {
                System.out.println(matcher.group(0));

            }
            matcher = title.matcher(mydata);
            if(matcher.find())
                System.out.println(matcher.group(0));
    }
}

Ideone Ideone

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM