简体   繁体   English

如何在有或没有第3方库的情况下将HTML表解析为Node.js中的json

[英]How to parse html table to json in nodejs with or without 3rd party libraries

I've got the following response from the server which needs to be converted to JSON format in NodeJS with or without additional libraries. 我从服务器收到以下响应,需要在有或没有其他库的情况下在NodeJS中将其转换为JSON格式。 I know this topic was touched a few times but couldn't find anything that would match good answer. 我知道这个话题曾被感动过几次,但找不到任何与好的答案匹配的东西。

<table class="sortable table">
    <tbody>
        <tr>
            <th width="5%">Rank</th>
            <th width="20%">Name</th>
            <th width="30%">Image</th>
            <th width="20%">Country</th>
            <th width="10%">Population</th>
        </tr>
        <tr bgcolor="#79ff76">
            <td align="center"><b>1</b></td>
            <td align="center"><a href="/link/Tokyo" title="Tokyo">Tokyo</a></td>
            <td>
                <a href="/img/Skyscrapers_of_Shinjuku_2009_January.jpg" class="image">
                    <img alt="Skyscrapers of Shinjuku 2009 January.jpg" src="/img/Skyscrapers_of_Shinjuku_2009_January.jpg"
                        width="200" height="200" />
                </a>
            </td>
            <td align="center"><a href="/link/Japan" title="Japan">Japan</a></td>
            <td align="center"><b>39,800,000</b></td>
        </tr>
        <tr bgcolor="#abd5f5">
            <td align="center">2</td>
            <td align="center"><a href="/link/Jakarta" title="Jakarta">Jakarta</a></td>
            <td>
                <a href="/img/Jakarta_Car_Free_Day.jpg" class="image">
                    <img alt="Jakarta Car Free Day.jpg" src="/img/Jakarta_Car_Free_Day.jpg" width="200" height="200" />
                </a>
            </td>
            <td align="center"><a href="/link/Indonesia" title="Indonesia">Indonesia</a></td>
            <td align="center">28,900,000</td>
        </tr>
    </tbody>
</table>

The output should be like that: 输出应该是这样的:

[
    {
      "name": "Tokyo",
      "country": "Japan",
      "population": 39800000,
      "url": "link/Tokyo"
    },
    {
      "name": "Jakarta",
      "country": "Indonesia",
      "population": 28900000,
      "url": "link/Jakarta"
    }
]

在Node中,您可以使用new DOMParser().parseFromString('<hi>Hello</hi>','text/html') ,这将返回具有属性的可解析DOM树,您可以通过该树来获取您的Object。

Considering that the position of the data in the table won't change you can use this code. 考虑到表中数据的位置不会改变,您可以使用此代码。 Ideally you should have some id or class to identify the data. 理想情况下,您应该具有一些ID或类来标识数据。 If you manage to do that, then change the selectors accordingly. 如果您可以这样做,请相应地更改选择器。

https://repl.it/@rafaelcastrocouto/Peter-M-Question https://repl.it/@rafaelcastrocouto/Peter-M-Question

var jsdom = require('jsdom').JSDOM;
jsdom.fromFile('table.html').then(function (dom) {
  var tableRows = dom.window.document.querySelectorAll("table tr");
  var array = [];
  for (var i=1; i<tableRows.length; i++) {
    var name = tableRows[i].querySelector('td:nth-child(2)').textContent;
    var country = tableRows[i].querySelector('td:nth-child(4)').textContent;
    var pop = tableRows[i].querySelector('td:nth-child(5)').textContent;
    var url = tableRows[i].querySelector('td:nth-child(2) a').href;
    array.push({
      'name': name,
      'country': country,
      'population': pop,
      'url': url
    });
  }
  var jsonString = JSON.stringify(array)
  console.log(jsonString);
});

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM