简体   繁体   中英

Find html text between two tags using jquery or cheerio

I thought that this would be rather straightforward but nothing really much work. I am writing this using cheerio in node.js. Basically, I have the following HTML

<h2 id="understanding-adc">
<a class="anchor" href="#understanding-adc" aria-hidden="true"><span class="octicon octicon-link"></span></a>Understanding ADC</h2>

<p>test</p>

<ol>
  <li>test</li>
  <li>test</li>
  <li>Optimization</li>
</ol>

<h2 id="data-switching">
<a class="anchor" href="#data-switching" aria-hidden="true"><span class="octicon octicon-link"></span></a>Data switching</h2>

<p>test test.</p>

So the scenario will be like this. If I pass the a h2 tag id lets say "#understanding-adc" I need to get the content between "#understanding-adc" and the next h2 tag "#data-switching". Here I know which h2 tag I needs to pass to the function, but not the second one.

The result I'm looking for is this:

<h2 id="understanding-adc">
    <a class="anchor" href="#understanding-adc" aria-hidden="true"><span class="octicon octicon-link"></span></a>Understanding ADC</h2>
    
    <p>test</p>
    
    <ol>
      <li>test</li>
      <li>test</li>
      <li>Optimization</li>
    </ol>

Please help me

First select the starting <h2> , then use nextUntil() to reach the end <h2> , call addBack() to put the first h2 element back into the result, wrapAll() to partition off the chunk you're interested in, and grab its HTML with parent() and html() .

const cheerio = require("cheerio"); // 1.0.0-rc.12

const html = `
<h2 id="understanding-adc">
<a class="anchor" href="#understanding-adc" aria-hidden="true"><span class="octicon octicon-link"></span></a>Understanding ADC</h2>

<p>test</p>

<ol>
  <li>test</li>
  <li>test</li>
  <li>Optimization</li>
</ol>

<h2 id="data-switching">
<a class="anchor" href="#data-switching" aria-hidden="true"><span class="octicon octicon-link"></span></a>Data switching</h2>

<p>test test.</p>
`;
const $ = cheerio.load(html);

// make sure we're in a container
$("body").children().wrapAll("<div></div>");

const htmlOut = $("#understanding-adc")
  .nextUntil("h2")
  .addBack()
  .wrapAll("div")
  .parent()
  .html();
console.log(htmlOut);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM