简体   繁体   English

用于解析表格并从HTML中选择的正则表达式

[英]Regular expression to parse Table and Select from HTML

I know Regular Expression is not right track to do this parsing job but it is recommended from my side. 我知道正则表达式并不是执行此解析工作的正确方法,但是我建议这样做。

If i have a HTML this below. 如果我有一个HTML下面。 I want to parse all the select info from html table. 我想解析HTML表中的所有选择信息。 For this i have used 为此,我用了

<table id='options_table'>\s*?(.+)?\s*?</table>

But this above giving me null result. 但这上面给我空结果。

and then to parse all select returned from above regex i will use 然后解析从上述正则表达式返回的所有选择,我将使用

<SELECT.*?>(.*?)<\/SELECT>

But above both getting null result. 但以上两者都得到空结果。

What should be the regex for Table and Select (from parsed table html) ? Table and Select(来自已解析的表html)的正则表达式应该是什么?

HTML Part HTML部分

<table id='options_table'>
    <tr><td colspan=3><font size="3" class="colors_productname">
    <i><b>Color</b></i>
    </font>
    <br /><table cellpadding="0" cellspacing="0" border="0"><tr><td><img class="vCSS_img_line_group_features" src="/v/vspfiles/templates/192/images/Line_Group_Features.gif" /></td></tr></table>
    </font></td></tr>
    <tr>
    <td align="right" vAlign="top">
    <img src="/v/vspfiles/templates/192/images/clear1x1.gif" width="1" height="4" border="0"><br />
    </td><td></td><td>
    <SELECT name="SELECT___S15FTAN01___29" onChange="change_option('SELECT___S15FTAN01___29',this.options[this.selectedIndex].value)">
    <OPTION value="176" >Ivory/Grey</OPTION>
    </SELECT>&nbsp;&nbsp;
    </td></tr>
    <tr>
    <td align="right" vAlign="top">
    <img src="/v/vspfiles/templates/192/images/clear1x1.gif" width="1" height="4" border="0"><br />
    </td><td></td><td>
    <SELECT name="SELECT___S15FTAN01___31" onChange="change_option('SELECT___S15FTAN01___31',this.options[this.selectedIndex].value)">
    <OPTION value="167" >0/3 months</OPTION>
    <OPTION value="169" >3/6 months</OPTION>
    <OPTION value="175" >6/9 months</OPTION>
    </SELECT>&nbsp;&nbsp;
    </td></tr>
    </table>

I don't know, GoLang, but I can tell you in perl, and I think you will be able to relate to GoLang. 我不知道,GoLang,但是我可以用perl告诉您,我认为您将能够与GoLang保持联系。
Firstly, regex to store table tag content ( https://regex101.com/r/tL7dA0/1 ): 首先,使用正则表达式存储table tag内容( https://regex101.com/r/tL7dA0/1 ):

$table = $1 if ($html =~ m/<table.*?>(.*)<\/table>/igs);

Regex for printing all the things between select tag ( https://regex101.com/r/xJ0xU1/1 ): 正则表达式,用于在select标签( https://regex101.com/r/xJ0xU1/1 )之间打印所有内容:

 while ($table =~ m/<select.*?>(.*?)<\/select>/isg){
            print $1."\n";
        }

As in your case, if html table contains inner table, then all the content of outer table would be selected. 与您的情况一样,如果html表包含内部表,则将选择外部表的所有内容。

i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z])
s modifier: single line. Dot matches newline characters
g modifier: global. All matches (don't return on first match)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM