简体   繁体   中英

Javascript for extracting anchor text from anchor tag

need help in the following.

In javascript, need to pass a input

as eg:

str="<a href=www.google.com>Google</a>"; // this is for example actual input vary
// str is passed as parameter for javascript function

The output should retrieve as 'Google'.

I have regex in java and it is working fine in it.

String regex = "< a [ ^ > ] * > ( . * ? ) < / a > ";
Pattern p = Pattern.compile(regex, Pattern.DOTALL | Pattern.CASE_INSENSITIVE);

but in javascript it is not working.

how can I do this in Javascript. Can anyone provide me help for javascript implementation.

I dont think you would like to use Regex for this. You may try simply like this:-

<a id="myLink" href="http://www.google.com">Google</a>

    var anchor = document.getElementById("myLink");

    alert(anchor.getAttribute("href")); // Extract link

    alert(anchor.innerHTML); // Extract Text

Sample DEMO

EDIT:- (As rightly commented by Patrick Evans)

var str = "<a href=www.google.com>Google</a>";
var str1 = document.createElement('str1');
str1.innerHTML = str;
alert(str1.textContent);
alert( str1.innerText);

Sample DEMO

Insert the HTML string into an element, and then just get the text ?

var str = "<a href=www.google.com>Google</a>";
var div = document.createElement('div');

div.innerHTML = str;
var txt = div.textContent ? div.textContent : div.innerText;

FIDDLE

In jQuery this would be :

var str = "<a href=www.google.com>Google</a>";
var txt = $(str).text();

FIDDLE

From the suggestions given by you all I got answer and works for me

function extractText(){
var anchText = "<a href=www.google.com>Google</a>";
    var str1 = document.createElement('str1');      
    str1.innerHTML = anchText;
    alert("hi "+str1.innerText);
    return anc;
}

Thanks everyone for the support

Just going to take an initial stab at this, I can update this is you add more tests cases or details to your question:

\w+="<.*>(.*)</.*>"

This matches your provided example, in addition it doesn't matter if:

  • the variable name is different
  • the tag or contents of the tag wrapping the text are different

What will break this, specifically, is if there are angle brackets inside your html tag, which is possible.

Note: It is a much better idea to do this using html as other answers have attempted, I only answered this with a regex because that was what OP asked for. To OP, if you can do this without a regex, do that instead. You should not attempt to parse HTML with javascript when possible, and this regex is not comparable to a full html parser.

No need for a regex, just parse the string with DOMParser and get the element and then use the DOM object methods/attributes

var parser = new DOMParser();
var str='<a href='www.google.com'>Google</a>"; 
var dom = parser.parseFromString(str,"text/xml");

//From there use dom like you would use document
var atags = dom.getElementsByTagName("a");
console.log( atags[0].textContent );

//Or
var atag = dom.querySelector("a");
console.log( atag.textContent );

//Or
var atag = dom.childNodes[0];
console.log( atag.textContent );

Only catch is DOMParser is not supported in IE lower than 9.

Well, if you're using JQuery this should be an easy task.

I would just create an invisible div and render this anchor () on it. Afterwards you could simply select the anchor and get it's inner text.

$('body').append('<div id="invisibleDiv" style="display:none;"></div>'); //create a new invisible div
$('#invisibleDiv').html(str); //Include yours "str" content on the invisible DIV
console.log($('a', '#invisibleDiv').html()); //And this should output the text of any anchor inside that invisible DIV.

Remember, to do this way you must have JQuery loaded on your page.

EDIT: Use only if you've already have JQuery on your project, since as stated below, something simple as this should not be a reason for the inclusion of this entire library.

Assuming that you are using java, from the provided code.

I would recommend you to use JSoup to extract text inside anchor tag.
Here's a reason why. Using regular expressions to parse HTML: why not?

String html = "<a href='www.google.com'>Google</a>";
Document doc = Jsoup.parse(html);
Element link = doc.select("a").first();

String linkHref = link.attr("href"); // "www.google.com"
String linkText = link.text(); // "Google""

String linkOuterH = link.outerHtml(); 
// "<a href='www.google.com'>Google</a>";
String linkInnerH = link.html(); // "<b>example</b>"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM