简体   繁体   中英

Node.js: how to create an array of specific objects based on data from html string?

I'm a beginner in Node.js and for testing purposes, I wanted to create a simple application that create an array of object based on given HTML.

Let me explain: I have a HTML string that contains multiple div elements like this:

<div class="user_container">
    <div class="user">
        <div class="thumb">
            <!--            thumbnail block-->
        </div>
        <div class="web_presence_locations"></div>

        <div class="user_data">
            <span class="name">Jaroslaw Chujczynski</span>
            <p class="location_with_flag">
                <!--                img with url here-->
                Leeds,
                United Kingdom
            </p>
            <div class="user_details">
                <div class="amount currency">
                    £28,000.00
                    <span class="overbooked">(in overfunding)</span>
                </div>
            </div>
        </div>
    </div>
    <div class="profile_container">
        <div class="extra_profile_data" style="">
            <div class="investments last">
                <h3 class="h5">Recent Investments</h3>
                <ul>
                    <li class="first">
                        <div class="campaign-logo-frame">
                            <a class="campaign_link" href="/test1">test1</a>
                            <span class="currency">£28,000.00</span>
                        </div>
                    </li>
                    <li class="">
                        <div class="campaign-logo-frame">
                            <a class="campaign_link" href="/test2">test2</a>
                            <span class="currency">£28,000.00</span>
                        </div>
                    </li>
                    <li class="">
                        <div class="campaign-logo-frame">
                            <a class="campaign_link" href="/test3">test3</a>
                            <span class="currency">£28,000.00</span>
                        </div>
                    </li>
                    <li class="">
                        <div class="campaign-logo-frame">
                            <a class="campaign_link" href="/test4">test4</a>
                            <span class="currency">£28,000.00</span>
                        </div>
                    </li>
                </ul>
            </div>
        </div>
    </div>
</div>

What I want to do is to create an object based on data that I have in the div above, so for example it will be something like this:

{
name: 'Jaroslaw Chujczynski',
location: 'Leeds, United Kingdom',
amountCurrency: '£28,000.00 (in overfunding)',
lastInvestments: [
 {
  name: 'test1',
  currency: '£28,000.00'
 }, {
  name: 'test2',
  currency: '£28,000.00'
 }, {
  name: 'test3',
  currency: '£28,000.00'
 }, {
  name: 'test4',
  currency: '£28,000.00'
 }]
}

And of course it will be many of divs like this in my html so I'll create an array of such objects.

Ok so what I have at the moment:

const fs = require('fs');
const cheerio = require('cheerio');

const getAllData = (fileName) => {
    try {
        return  fs.readFileSync(fileName, 'utf8');
    } catch(e) {
        console.log('Error:', e.stack);
    }
}
const data = getAllData('test.html');
const $ = cheerio.load(data);

const filterData = () => {
    console.log($('div[class="user_container"]'));
}

filterData();

And it's returning me something like this - that is unwanted (or it has to be like it is?):

 namespace: 'http://www.w3.org/1999/xhtml',
    attribs: [Object: null prototype] {
      class: 'user_container'
    },
    'x-attribsNamespace': [Object: null prototype] {
      class: undefined
    },
    'x-attribsPrefix': [Object: null prototype] {
      class: undefined
    },
    children: [ [Node], [Node], [Node], [Node], [Node], [Node] ],
    parent: Node {
      type: 'tag',
      name: 'section',
      namespace: 'http://www.w3.org/1999/xhtml',
      attribs: [Object: null prototype],
      'x-attribsNamespace': [Object: null prototype],
      'x-attribsPrefix': [Object: null prototype],
      children: [Array],
      parent: [Node],
      prev: [Node],
      next: [Node]
    },
    etc....

So I'm not sure but I thing as first I have to get an array of div block where class is user_container and when I get it then I have to iterate over this array to create object for each of them.

Can someone help me with this?

html is a type of XML -- you should look at the XML tools -- have that tools parse the html and then you can run XML queries on them with the tool. This will allow you to xtract XML which you can convert to JSON.

A quick google search returns the following XML tool for nodejs -- but there are many more:

https://www.npmjs.com/package/fast-xml-parser - says that it will also export to JSON

http://www.curtismlarson.com/blog/2018/10/03/edit-xml-node-js/ - has a detailed walk thu.

I can get you started at least:

const data = $('.user_container').get().map(div => {
  return {
    name: $(div).find('.name').text(),
    location: $(div).find('.location_with_flag').text(),
    amountCurrency: $(div).find('.amount.currency').text(),
  }
})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM