Get element by id with regex

Question

I had a quick question regarding RegEx...

I have a string that looks something like the following:

"This was written by <p id="auth">John Doe</p> today!"

What I want to do (with javascript) is basically extract out the 'John Doe' from any tag with the ID of "auth".

Could anyone shed some light? I'm sorry to ask.

Full story: I am using an XML parser to pass data into variables from a feed. However, there is one tag in the XML document () that contains HTML passed into a string. It looks something like this:

 <item>
  <title>This is a title</title>
  <description>
  "By <p id="auth">John Doe</p> text text text... so on"
  </description>
 </item>

So as you can see, I can't use an HTML/XML parser for that p tag, because it's in a string, not a document.

Answer 1

No need of regular expressions to do this. Use the DOM instead.

var obj = document.getElementById('auth');
if (obj)
{
    alert(obj.innerHTML);
}

By the way, having multiples id with the same value in the same page is invalid (and will surely result in odd JS behavior).

If you want to have many auth on the same page use class instead of id . Then you can use something like:

//IIRC getElementsByClassName is new in FF3 you might consider using JQuery to do so in a more "portable" way but you get the idea...
var objs = document.getElementsByClassName('auth');
if (objs)
{
    for (var i = 0; i < objs.length; i++)
        alert(obj[i].innerHTML);
}

EDIT: Since you want to parse a string that contain some HTML, you won't be able to use my answer as-iis. Will your HTML string contain a whole HTML document? Some part? Valid HTML? Partial (broken) HTML?

Answer 2

Here's a way to get the browser to do the HTML parsing for you:

var string = "This was written by <p id=\"auth\">John Doe</p> today!";

var div = document.createElement("div");

div.innerHTML = string; // get the browser to parse the html

var children = div.getElementsByTagName("*");

for (var i = 0; i < children.length; i++)
{
    if (children[i].id == "auth")
    {
        alert(children[i].textContent);
    }
}

If you use a library like jQuery, you could hide the for loop and replace the use of textContent with something cross-browser.

Answer 3

Perhaps something like

document.getElementById("auth").innerHTML.replace(/<^[^>]+>/g, '')

might work. innerHTML is supported on all modern browsers. (You may omit the replace if you don't care about removing HTML bits from the inner content.)

If you have jQuery at your disposal, just do

$("#auth").text()

Answer 4

What I want to do (with javascript) is basically extract out the 'John Doe' from any tag with the ID of "auth".

You can't have the same id ( auth ) for more than one element. An id should be assigned once per element per page.

If, however, you assign a class of auth to elements, you can go about something like this assuming we are dealing with paragraph elements:

// find all paragraphs
var elms = document.getElementsByTagName('p');

for(var i = 0; i < elms.length; i++)
{
  // find elements with class auth
  if (elms[i].getAttribute('class') === 'auth') {
    var el = elms[i];

    // see if any paragraph contains the string
    if (el.innerHTML.indexOf('John Doe') != -1) {
      alert('Found ' + el.innerHTML);
    }
  }
}

Answer 5

If the content of the tag contains only text, you could use this:

function getText (htmlStr, id) {
  return new RegExp ("<[^>]+\\sid\\s*=\\s*([\"'])"
    + id 
    + "\\1[^>]*>([^<]*)<"
  ).exec (htmlStr) [2];
}


var htmlStr = "This was written by <p id=\"auth\">John Doe</p> today!";
var id = "auth";
var text = getText (htmlStr, id);
alert (text === "John Doe");

Answer 6

Assuming you only have 1 auth per string, you might go with something like this:

var str = "This was written by <p id=\"auth\">John Doe</p> today!",
    p = str.split('<p id="auth">'),
    q = p[1].split('</p>'),
    a = q[0];
alert(a);

Simple enough. Split your string on your paragraph, then split the second part on the paragraph close, and the first part of the result will be your value. Every time.

Get element by id with regex

Question

6 answers

solution1
2 2010-08-04 19:52:25

solution2
2 ACCPTED 2010-08-04 20:26:02

solution3
0 2010-08-04 19:51:18

solution4
0 2010-08-04 19:55:59

solution5
0 2010-08-04 20:16:25

solution6
0 2010-08-04 20:28:27

Get element by id with regex

Question

6 answers

solution1 2 2010-08-04 19:52:25

solution2 2 ACCPTED 2010-08-04 20:26:02

solution3 0 2010-08-04 19:51:18

solution4 0 2010-08-04 19:55:59

solution5 0 2010-08-04 20:16:25

solution6 0 2010-08-04 20:28:27

solution1
2 2010-08-04 19:52:25

solution2
2 ACCPTED 2010-08-04 20:26:02

solution3
0 2010-08-04 19:51:18

solution4
0 2010-08-04 19:55:59

solution5
0 2010-08-04 20:16:25

solution6
0 2010-08-04 20:28:27