简体   繁体   English

在大于小于之间获取数据

[英]Grabbing Data between Greater than Less Than

I am mainly an SQL programmer with slight experience in Java. 我主要是一位在Java方面有少许经验的SQL程序员。

Im not going to bore you with all the code that I have written which is working until this point. 直到现在为止,我都不会厌倦我编写的所有代码。 But at this point, I am trying to extract data from a stock market site and tossing that data into a file I create CSV. 但是在这一点上,我试图从股票市场站点中提取数据,并将该数据放入创建CSV的文件中。

I am retrieving line by line, the html code, which is using td and /td to open and close columns. 我正在逐行检索html代码,该代码使用td和/ td打开和关闭列。 I want to grab the data between the Greater than sign and Less than sign and then move on to the next. 我想抓住比符号和小于号大的数据,然后移动到下一个。 Just struggling to figure this out without making it too complicated. 只是努力弄清楚这一点,而又不会使其变得太复杂。

Describe expected and actual results: 说明预期和实​​际的结果:

So if I have 所以如果我有

<td class="blah" class="blah">STOCK</td><td class="blah" class="blah">STOCK COMPANY NAME</td>

I want to grab STOCK into a string and then STOCK COMPANY NAME. 我想将STOCK转换为字符串,然后再选择STOCK公司名称。

All I want help with is the code between > ***** < ... no more then that because I am enjoying the learning process... just been stuck for a few hours. 我需要的是> ***** <...之间的代码,仅此而已,因为我很喜欢学习过程,只是被困了几个小时。

You can use regex with look-behind and look-ahead - (?<=>).*?(?=<) . 您可以将正则表达式与-look和back-ahead一起使用- (?<=>).*?(?=<)
(?<=>) means preceded by greater than symbol (?<=>)表示前面有大于号
.*? match any number of characters, non-greedy 匹配任意数量的字符,非贪婪
(?=<) followed by a less than symbol (?=<)后跟一个小于号

String input = "<td class=\"blah\" class=\"blah\">STOCK</td><td class=\"blah\" class=\"blah\">STOCK COMPANY NAME</td>";
Matcher matcher = Pattern.compile("(?<=>).*?(?=<)").matcher(input);
List<String> res = new ArrayList<>();
while (matcher.find()) res.add(matcher.group());
res = res.stream().filter(s -> !s.isEmpty()).collect(Collectors.toList()); //remove empty strings
System.out.println(res);

Output 输出量

[STOCK, STOCK COMPANY NAME]

Note : It's best to use an HTML parser instead, like jsoup . 注意 :最好改用HTML解析器,例如jsoup

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM