简体   繁体   中英

How can I line break p tag in Cheerio?

I'm scraping some paragraphs from a website and I occur this problem but I don't know how to resolve it.

The structure is something like this, for example:

<div class = "container">
   <p> This is a long paragraph 1. </p>
   <p> This is a long paragraph 2. </p>
   <p> This is a long paragraph 3. </p>
   <p> This is a long paragrahp 4. </p>
</div>

So I had do something like this to get the text inside the example paragraph I've just mentioned.

function scrapeData() {
    let data = []
    let url = `scraping-url`;
    axios(url)
    .then(response =>{
        const html = response.data
        const $ = cheerio.load(html, {xmlMode: true})

        $('.container', html).each(function(){
            const text = $(this).find('p').text()
            data.push({
              text
            })
            console.log(data)
        })

    }).catch(err => console.log(err))
}

But the result I get is {This is a long paragraph 1.This is a long paragraph 2.This is a long paragraph 3.This is a long paragraph 4.} sticking together, I want to separate these paragraphs into each chunk of text

I want it like this in my console.log(data)

{
    This is a long paragraph 1.
    This is a long paragraph 2.
    This is a long paragraph 3.
    This is a long paragraph 4.
}

Adapt the selector to match p tags, and then loop through each and construct your data.

Try this:

   // select p tags in the container
    $('.container p', html).each(function(){
        const text = $(this).text();
        data.push({
          text
        });
    });

    console.log(data);

Maybe add the newlines after:

$('p').after("\n")

Or when you join them:

$('p').get().map(p => $(p).text()).join("\n")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM