简体   繁体   中英

How parse fetch text from html tags in nodejs?

I have a html as text in nodejs as follow:

 var htmlText = `<div class="X7NTVe"> <a class="tHmfQe" href="/link1"> <div class="am3QBf"> <div> <span> <div class="BNeawe deIvCb AP7Wnd"> <span dir="rtl">My First Text</span> </div> </span> </div> </div> </a> <div class="HBTM6d XS7yGd"> <a href="/anotherLink1"> <div class="BNeawe mAdjQc uEec3 AP7Wnd">&gt;</div> </a> </div> </div> <div class="x54gtf"></div> <div class="X7NTVe"> <a class="tHmfQe" href="/link2"> <div class="am3QBf"> <div> <span> <div class="BNeawe deIvCb AP7Wnd"> <span dir="rtl">My Second Text</span> </div> </span> </div> </div> </a> <div class="HBTM6d XS7yGd"> <a href="/anotherLink2"> <div class="BNeawe mAdjQc uEec3 AP7Wnd">&gt;</div> </a> </div> </div> <div class="x54gtf"></div>`

Now I Want to fetch text form it as array. In abow example it must return My First Text and My Second Text . How can I do it?

Note : I want to do it in nodejs note in javascript.

With cheerio:

let $ = cheerio.load(html)
let strings = $('div[class="BNeawe deIvCb AP7Wnd"]>span[dir]')
              .get().map(span => $(span).text())

method#1

replace all tags with regex /<[^>]*>/g .

method#2

parse html with jsdom , and access html node via js document api.

method#2 is much more flexible.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM