detecting multiple html tags with javascript and regex

Question

I am building a chrome extension which would read the current page and detect specific html/xml tags out of it :

For example if my current page contains the following tags or data :

some random text here and there

<investmentAccount acctType="individual" uniqueId="1629529524">
<accountName>state bank of america</accountName>
<accountHolder>rahul raina</accountHolder>
<balance balType="totalBalance">
<curAmt curCode="USD">516545.84</curAmt>
</balance>
<asOf localFormat="MMM dd, yyyy">2013-08-31T00:00:00</asOf>
<holdingList>
<holding holdingType="mutualFund" uniqueId="-2044388005">
<description>Active Global Equities</description>
<value curCode="USD">159436.01</value>
</holding>
<holding holdingType="mutualFund" uniqueId="-556870249">
<description>Passive Non-US Equities</description> 
<value curCode="USD">72469.76</value>
</holding>
</holdingList>
<transactionList/>
</investmentAccount>
</site>
some data 123

<site name="McKinsey401k">
<investmentAccount acctType="individual" uniqueId="1629529524">
<accountName>rahuk</accountName>
<accountHolder>rahuk</accountHolder>
<balance balType="totalBalance">
<curAmt curCode="USD">516545.84</curAmt>
</balance>
<asOf localFormat="MMM dd, yyyy">2013-08-31T00:00:00</asOf>
<holdingList>
<holding holdingType="mutualFund" uniqueId="1285447255">
<description>Special Sits. Aggr. Long-Term</description>
<value curCode="USD">101944.69</value>
</holding>
<holding holdingType="mutualFund" uniqueId="1721876694">
<description>Special Situations Moderate $</description>
<value curCode="USD">49444.98</value>
</holding>
</holdingList>
<transactionList/>
</investmentAccount>
</site>

So I need to identify say tag and print the text between the starting and ending tag ie : "State bank of america" and "rahukk"

So this is what I have done till now:

    function countString(document_r,a,b) {
var test = document_r.body; 
var text = typeof test.textContent == 'string'? test.textContent : test.innerText; 
var testRE = text.match(a+"(.*)"+b);
return testRE[1];

}



chrome.extension.sendMessage({
    action: "getSource",
    source: "XML DETAILS>>>>>"+"\nAccount name is: " +countString(document,'<accountName>','</accountName>')
});

But this only prints the innertext of only the first tag it encounters in the page ie "State bank of america".

What if I want to print only "rahukk" which is the innertext of last tag in the page or both.

How do I print the innertext of last tag it encounters in the page or how does it print all the tags ?

Thanks in advance.

EDIT : The document above itself is an HTML page i have just put the contents of the page

UPDATE : So I did some here and there from the suggestions below and the best I could reach by this code :

function countString(document_r) {


var test = document_r.body; 
var text = test.innerText; 

var tag = "accountName";
var regex = "<" + tag + ">(.*?)<\/" + tag + ">";
var regexg = new RegExp(regex,"g");
var testRE = text.match(regexg);
return testRE;
}

chrome.extension.sendMessage({
    action: "getSource",
    source: "XML DETAILS>>>>>"+"\nAccount name is: " +countString(document)
});

But this gave me :

XML DETAILS>>>>> Retirement Program (Profit-Sharing Retirement Plan (PSRP) and Money Purchase Pension Plan (MPPP)),Retirement Program (Profit-Sharing Retirement Plan (PSRP) and Money Purchase Pension Plan (MPPP)),Retirement Program (Profit-Sharing Retirement Plan (PSRP) and Money Purchase Pension Plan (MPPP))

This again because the same XML was present in the page 3 times and What I want is that regex to match only from the last XML and I don't want the tag names too.

So my desired output would be:

XML DETAILS>>>>> Retirement Program (Profit-Sharing Retirement Plan (PSRP) and Money Purchase Pension Plan (MPPP))

Answer 1

Regex pattern like this: <accountName>(.*?)<\\/accountName>

var tag = "accountName";
var regex = "<" + tag + ">(.*?)<\/" + tag + ">";
var testRE = text.match(regex);

=> testRE contains all your matches, in case of tag=accountName it contains "state bank of america" and "rahukk"

UPDATE

According to this page to receive all matches, instead of only the first one, you smust add a "g" flag to the match pattern.

"g: The global search flag makes the RegExp search for a pattern throughout the string, creating an array of all occurrences it can find matching the given pattern." found here

Hope this helps you!

Answer 2

you match method is not global.

var regex = new RegExp(a+"(.*)"+b, "g");
text.match(regex);

Answer 3

If the full XML string is valid, you can parse it into an XML document using the DOMParser.parseFromString method :

var xmlString = '<root>[Valid XML string]</root>';
var parser = new DOMParser();
var doc = parser.parseFromString(xmlString, 'text/xml');

Then you can get a list of tags with a specified name directly:

var found = doc.getElementsByTagName('tagName');

Here's a jsFiddle example using the XML you provided, with two minor tweaks—I had to add a root element and an opening tag for the first site .

Answer 4

You don't need regular expressions for your task (besides, read RegEx match open tags except XHTML self-contained tags for why it's not a good idea!). You can do this completely via javascript:

var tag = "section";
var targets = document.getElementsByTagName(tag);
for (var i = targets.length; i > 0; i--) {
    console.log(targets[i].innerText);
}

detecting multiple html tags with javascript and regex

Question

4 answers

solution1
1 2013-10-23 07:12:11

solution2
1 2013-10-23 07:12:18

solution3
1 2013-10-23 08:41:44

solution4
0 2013-10-23 08:03:56

detecting multiple html tags with javascript and regex

Question

4 answers

solution1 1 2013-10-23 07:12:11

solution2 1 2013-10-23 07:12:18

solution3 1 2013-10-23 08:41:44

solution4 0 2013-10-23 08:03:56

solution1
1 2013-10-23 07:12:11

solution2
1 2013-10-23 07:12:18

solution3
1 2013-10-23 08:41:44

solution4
0 2013-10-23 08:03:56