简体   繁体   English

Javascript:带有变量的正则表达式,以匹配包含HTML代码的字符串的一部分

[英]Javascript: regex with variables to match part of a string containing HTML code

I'm trying to match a regex (containing 1 variable) against a page of HTML code stored as a string. 我正在尝试将正则表达式(包含1个变量)与作为字符串存储的HTML代码页面进行匹配。

The HTML string is an array, each element containing something as shown below. HTML字符串是一个数组,每个元素包含如下所示的内容。 (I have split on a certain tag). (我在某个标签上分开了)。 Each element of the array contains some data of a House (name, amount of square meters, etc). 数组的每个元素都包含房屋的一些数据(名称,平方米数量等)。 Fictional of course. 当然是虚构的。 The point is that I need to match only 1 of these houses by matching the text between the first TD tags, and the part that I need is the VALUE (digits) in the last INPUT tag of the form. 关键是,通过匹配第一个TD标签之间的文本,我只需要匹配这些房子中的一个,我需要的部分是表格的最后一个INPUT标签中的VALUE(数字)。

<TR BGCOLOR=#D4C0A1>
 <TD WIDTH=40%><NOBR>Luminous&#160;Arc&#160;2</NOBR></TD>
 <TD WIDTH=10%><NOBR>154&#160;sqm</NOBR></TD>
 <TD WIDTH=10%><NOBR>6460&#160;gold</NOBR></TD>
 <TD WIDTH=40%><NOBR>rented</NOBR></TD>
 <TD><TABLE BORDER=0 CELLSPACING=0 CELLPADDING=0>
 <FORM ACTION= METHOD=post><TR><TD>
  <INPUT TYPE=hidden NAME=world VALUE=Olympa>
  <INPUT TYPE=hidden NAME=town VALUE="Yalahar">
  <INPUT TYPE=hidden NAME=state VALUE=>
  <INPUT TYPE=hidden NAME=type VALUE=houses>
  <INPUT TYPE=hidden NAME=order VALUE=>
  <INPUT TYPE=hidden NAME=houseid VALUE=37010>
  <INPUT TYPE=image NAME="View" ALT="View" SRC="" BORDER=0 WIDTH=120 HEIGHT=18>
</TD></TR></FORM></TABLE></TD></TR>

I constructed the following RegEx: 我构造了以下RegEx:

var regex = new RegExp(house + "[\\\\s\\\\S]+name=houseid value=([0-9]+)>", "i");

where house is the name of the house (in this example, Luminous&#160;Arc&#160;2 ) and the part I need would be the houseid 37010 . 其中househouse的名称(在此示例中为Luminous&#160;Arc&#160;2 ),我需要的部分将是房屋37010

I figured this Regex should work quite fine and give me the hit that I need, however houses[i].match(regex) returns null every time. 我认为此Regex应该可以正常工作,并为我提供所需的命中率,但是houses[i].match(regex)每次都返回null。 I get no match in the string. 字符串中没有匹配项。

I have tried several approaches so far, including attempting to convert the string to a DOM Object to split up on TR tags (the conversion failed). 到目前为止,我已经尝试了几种方法,包括尝试将字符串转换为DOM对象以拆分TR标签(转换失败)。 I feel that I am close, but I am stuck. 我感觉自己很近,但是被困住了。

Does anyone see why my regex might fail to work? 有人知道为什么我的正则表达式可能无法正常工作吗?

Kenneth 肯尼斯

You could add the string to your html (in a display:none div or something like that), and then just access the DOM like you would anywhere. 您可以将字符串添加到html中(在display:none div或类似的名称中),然后像在任何地方一样访问DOM。

For example: 例如:

<div id="stringContainer"></div>
var searchstring = "Luminous&#160;Arc&#160;2";
searchstring = searchstring.replace(/&#160;/g, '&nbsp;') // Convert &#160; to &nbsp;

var c = document.getElementById("stringContainer");
c.innerHTML = '<table>'+houses+'</table>';
var h = c.getElementsByTagName('tr');

for(var i = 0, l = h.length; i < l; i++){ // Loop through the found elements
    var name = h[i].firstChild.nextSibling.getElementsByTagName('nobr')[0]; // Get the house's name.
    if(name && name.innerHTML == searchstring){ // If the name matches the search string. (innerHTML returns &nbsp; instead of &#160;. hence the replace earlier.)
        console.log(h[i].getElementsByTagName('input')[5].value) // log the value.
    }
}

Working example 工作实例

Assuming the variable houses is: 假设可变houses为:

var houses = '<TR BGCOLOR=#D4C0A1>\n\
<TD WIDTH=40%><NOBR>Luminous&#160;Arc&#160;2</NOBR></TD>\n\
<TD WIDTH=10%><NOBR>154&#160;sqm</NOBR></TD>\n\
<TD WIDTH=10%><NOBR>6460&#160;gold</NOBR></TD>\n\
<TD WIDTH=40%><NOBR>rented</NOBR></TD>\n\
<TD>\n\
    <TABLE BORDER=0 CELLSPACING=0 CELLPADDING=0>\n\
        <FORM ACTION= METHOD=post>\n\
            <TR>\n\
            <TD>\n\
            <INPUT TYPE=hidden NAME=world VALUE=Olympa>\n\
            <INPUT TYPE=hidden NAME=town VALUE="Yalahar">\n\
            <INPUT TYPE=hidden NAME=state VALUE=>\n\
            <INPUT TYPE=hidden NAME=type VALUE=houses>\n\
            <INPUT TYPE=hidden NAME=order VALUE=>\n\
            <INPUT TYPE=hidden NAME=houseid VALUE=37010>\n\
            <INPUT TYPE=image NAME="View" ALT="View" SRC="" BORDER=0 WIDTH=120 HEIGHT=18>\n\
            </TD>\n\
            </TR>\n\
        </FORM>\n\
    </TABLE>\n\
</TD>\n\
</TR>\n\
<TR BGCOLOR=#D4C0A1>\n\
<TD WIDTH=40%><NOBR>Dark&#160;Arc&#160;2</NOBR></TD>\n\
<TD WIDTH=10%><NOBR>154&#160;sqm</NOBR></TD>\n\
<TD WIDTH=10%><NOBR>6460&#160;gold</NOBR></TD>\n\
<TD WIDTH=40%><NOBR>rented</NOBR></TD>\n\
<TD>\n\
    <TABLE BORDER=0 CELLSPACING=0 CELLPADDING=0>\n\
        <FORM ACTION= METHOD=post>\n\
            <TR>\n\
            <TD>\n\
            <INPUT TYPE=hidden NAME=world VALUE=Olympa>\n\
            <INPUT TYPE=hidden NAME=town VALUE="Yalahar">\n\
            <INPUT TYPE=hidden NAME=state VALUE=>\n\
            <INPUT TYPE=hidden NAME=type VALUE=houses>\n\
            <INPUT TYPE=hidden NAME=order VALUE=>\n\
            <INPUT TYPE=hidden NAME=houseid VALUE=37010>\n\
            <INPUT TYPE=image NAME="View" ALT="View" SRC="" BORDER=0 WIDTH=120 HEIGHT=18>\n\
            </TD>\n\
            </TR>\n\
        </FORM>\n\
    </TABLE>\n\
</TD>\n\
</TR>';

I tried your regex with Cerbrus's houses variable and it works fine. 我用Cerbrus的houses变量尝试了您的正则表达式,它工作正常。
(I added the lazy quantifier ? to [\\\\s\\\\S]+ , but it works fine without it as well.) (我在[\\\\s\\\\S]+添加了惰性量词? ,但是如果没有它也可以正常工作。)

var house = "Luminous&#160;Arc&#160;2";
var regex = new RegExp( house + "[\\s\\S]+?name=houseid value=([0-9]+)>", "i" );

houses.match( regex )[1];    // "37010"

Presumably then, your house variable has the wrong value or houses[i] is not accessing the right string. 大概就是这样,您的house变量值错误,或者houses[i]没有访问正确的字符串。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM