简体   繁体   中英

java regular expressions regex

I have problem with extracting data from website. Im trying to get name of company and price its: SYGNITY and 8,40

<a class="link" href="http://www.money.pl/gielda/spolki-gpw/PLCMPLD00016.html">SYGNITY</a>

        </td>
        <td class="ac"><a href="javascript: OO('SGN','2015-10-01')"><img width="12" height="11" src="http://static1.money.pl/i/gielda/chart.gif" title="Pokaż wykres" alt="Pokaż wykres" /></a></td>
                        <td class="al">SGN</td>
                    <td class="ar">8,40</td> 

I tried to use this pattern but it doesnt work:

String expr = "<a class=\"link\" href=\"(.+?)\">(.+?)</a>(.+?)<td class=\"ar\">(.+?)</td> ";

any advices?

Using JSoup parser

You should use a html parser like JSoup since regex is not a good idea to parse html.

You can do something like this:

String htmlString = "YOUR HTML HERE";
Document document=Jsoup.parse(htmlString);
Element element=document.select("a[href=http://www.money.pl/gielda/spolki-gpw/PLCMPLD00016.html]").first();
System.out.println(element.text()); //SYGNITY

element=document.select("td[class=ar]").first();
System.out.println(element.text()); //8,40

Using regex

If you still want to use a regex, then you could use a regex like below and grab the content from capturing groups:

PLCMPLD00016.html">(.*?)<\/a>|"ar">(.*?)<\/td> 

Working demo

String htmlString = "YOUR HTML HERE"
Pattern pattern = Pattern.compile("PLCMPLD00016.html">(.*?)<\\/a>|"ar">(.*?)<\\/td>");

Matcher matcher = pattern.matcher(htmlString );
while (matcher.find()) {
    System.out.println(matcher.group(1));
    System.out.println(matcher.group(2));
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM