简体   繁体   中英

JavaScript equivalent of php DOMDocument Object

I wrote a code in PHP for parsing data that I received by an API request from "wikipedia.org". I used DOMDocument class to parse the data and it worked perfectly fine. Now I want to do the same job in JavaScript. The API request returns (after a little cleaning up) a string like this:

$htmlString = "<ul>
    <li>Item 1</li>
    <li>Item 2</li>
</ul>
<ul>
    <li>Item 3</li>
    <li>Item 4</li>
    <li>Item 5</li>
</ul>"

Note that this is just an example. Any request might have different number of lists, but it is always a series of unordered lists. I needed to get the text inside the <li> tags and the following PHP code worked perfectly fine.

$DOM = new DOMDocument;
$DOM->loadHTML($htmlString);
$lis = $DOM->getElementsByTagName('li');
$items =[];
for ($i = 0; $i < $lis->length; $i++) $items[] = $lis[$i]->nodeValue;

And I get the array [Item 1,...,Item 5] inside $items variable as I wanted. Now I want to do the same job in JavaScript. That is I have a string

htmlString = "<ul>
    <li>Item 1</li>
    <li>Item 2</li>
</ul>
<ul>
    <li>Item 3</li>
    <li>Item 4</li>
    <li>Item 5</li>
</ul>"

in JavaScript and I want to get the text inside each of the <li> tags. I searched the web for an equivalent class to PHP DOMDocument in JavaScript, and surprisingly I found nothing. Any ideas how to do this in (preferably Vanilla) JavaScript similar to the PHP code? If not, any idea how to do this anyway in JavaScript (even maybe with regular expressions)?

Use DOMParser()

Your ported code, which is very much the same as your PHP:

 let parser = new DOMParser() let doc = parser.parseFromString(`<ul> <li>Item 1</li> <li>Item 2</li> </ul> <ul> <li>Item 3</li> <li>Item 4</li> <li>Item 5</li> </ul>`, "text/html") let lis = doc.getElementsByTagName('li') let items = [] for (let i = 0; i < lis.length; i++) items.push(lis[i].textContent) console.log(items)

If you're working strictly with strings, you want to use Regular Expressions.

FYI I'm using ES20xx syntax. If you can't support this, you'll need to convert to the syntax you're users can access.

Here I have an expressions that captures whatever is in between opening <ul> or <li> and the closing tags. Then I use the line breaks to split the string into an array. We need to filter out empty elements from the resulting array and finally return the desired items in a final array.

 var htmlString = `<ul> <li>Item 1</li> <li>Item 2</li> </ul> <ul> <li>Item 3</li> <li>Item 4</li> <li>Item 5</li> </ul>`; var lis = htmlString.replace(/<ul>|<li>(.*)<\/li>|<\/ul>/g, '$1').split('\n'); var items = lis.filter(item => { if (item && item;== null && item.== '') { return item. } }),map(item => { var element = item,replace(/\s{2;}/g; ''); return element. }). console,log('items array;', items);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM