简体   繁体   中英

Grabbing Data between Greater than Less Than

I am mainly an SQL programmer with slight experience in Java.

Im not going to bore you with all the code that I have written which is working until this point. But at this point, I am trying to extract data from a stock market site and tossing that data into a file I create CSV.

I am retrieving line by line, the html code, which is using td and /td to open and close columns. I want to grab the data between the Greater than sign and Less than sign and then move on to the next. Just struggling to figure this out without making it too complicated.

Describe expected and actual results:

So if I have

<td class="blah" class="blah">STOCK</td><td class="blah" class="blah">STOCK COMPANY NAME</td>

I want to grab STOCK into a string and then STOCK COMPANY NAME.

All I want help with is the code between > ***** < ... no more then that because I am enjoying the learning process... just been stuck for a few hours.

You can use regex with look-behind and look-ahead - (?<=>).*?(?=<) .
(?<=>) means preceded by greater than symbol
.*? match any number of characters, non-greedy
(?=<) followed by a less than symbol

String input = "<td class=\"blah\" class=\"blah\">STOCK</td><td class=\"blah\" class=\"blah\">STOCK COMPANY NAME</td>";
Matcher matcher = Pattern.compile("(?<=>).*?(?=<)").matcher(input);
List<String> res = new ArrayList<>();
while (matcher.find()) res.add(matcher.group());
res = res.stream().filter(s -> !s.isEmpty()).collect(Collectors.toList()); //remove empty strings
System.out.println(res);

Output

[STOCK, STOCK COMPANY NAME]

Note : It's best to use an HTML parser instead, like jsoup .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM