简体   繁体   English

将 HTML 字符串转换为 JSON 对象

[英]Convert string of HTML into JSON Object

I am taking an old hardcoded website of mine and trying to strip the data out of the HTML and drop it into a new JSON object.我正在使用我的一个旧的硬编码网站,并试图从 HTML 中剥离数据并将其放入一个新的 JSON 对象中。

Currently I am receiving a table of items (reduced for simplicity) as 1 giant string, there are almost 1000 rows.目前我收到一个项目表(为简单起见减少了)作为 1 个巨大的字符串,几乎有 1000 行。 There are no classes or attributes on any of the HTML任何 HTML 上都没有类或属性

let tableString = `
    <tr>
        <td>01/01/1999</td>
        <td>Item 1</td>
        <td>55</td>
    </tr>
    <tr>
        <td>01/01/2000</td>
        <td>Item 2</td>
        <td>35</td>
    </tr>
`

I am working towards achieving the following object我正在努力实现以下目标

[{
    date: '01/01/1999',
    name: 'Item 1',
    cost: 55
},
{
    date: '01/01/2000',
    name: 'Item 2',
    cost: 35
}]

Current code I have implemented我已经实现的当前代码

let newData = []

let stringArray = results.split('</tr>')

stringArray.map(item => {

    let stripped = item.replace('/n', '')
        stripped = stripped.replace('<tr>', '')

    let items = stripped.split('<td>')

    let newItem = {
        data: items[0],
        name: items[1],
        cost: items[2]
    }

    return newData.push(newItem)
})

I am taking the giant string and splitting it at the end of every item.我拿着巨大的绳子,在每个项目的末尾把它分开。 This works however it strips the actual tag out of the item itself and leaves me with an extra (empty string item in my array).这有效,但是它从项目本身中去除了实际标签,并给我留下了一个额外的(我的数组中的空字符串项目)。

Next I am mapping over each string in my array and further trying to strip all line breaks out as well as the in order to have an array of table cells, then In theory I can build out my object (after I strip the table cells out).接下来,我将映射数组中的每个字符串,并进一步尝试去除所有换行符以及表格单元格数组,然后理论上我可以构建我的对象(在我去除表格单元格之后)。

However as I am doing this replace doesnt seem to be working, is my thinking process correct on how I am moving forward, should I look at regex patterns to target this better?然而,当我在做这个replace似乎没有工作,我的思考过程是否正确,我是如何前进的,我应该看看正则表达式模式来更好地瞄准这个吗?

You could just stick the trs into a table and process the data out of the table element.您可以将 trs 粘贴到表格中并处理表格元素中的数据。

 let tableString = ` <tr> <td>01/01/1999</td> <td>Item 1</td> <td>55</td> </tr> <tr> <td>01/01/2000</td> <td>Item 2</td> <td>35</td> </tr> `; const table = document.createElement('table'); table.innerHTML = tableString; console.log( [...table.querySelectorAll('tr')].map(tr => { return { date: tr.children[0].innerText, name: tr.children[1].innerText, cost: tr.children[2].innerText }; }) );

Here's a while loop that uses substrings and indexOfs .这是一个使用子字符串indexOfs的 while 循环。 It makes use of the often neglected second parameter for indexOf, which allows you to specify the minimum starting position for the search.它利用 indexOf 经常被忽视的第二个参数,它允许您指定搜索的最小起始位置。 It's probably better to just create the HTML table element and read the innerHTML of each td, but if this is easier for you, here you go:最好只创建 HTML 表格元素并读取每个 td 的 innerHTML,但如果这对您来说更容易,那么您可以这样做:

 let str = ` <tr> <td>01/01/1999</td> <td>Item 1</td> <td>55</td> </tr> <tr> <td>01/01/2000</td> <td>Item 2</td> <td>35</td> </tr> `; var BEGIN = "<td>"; var END = "</td>"; var objs = []; while (str.indexOf(BEGIN) > -1 && str.indexOf(END, str.indexOf(BEGIN)) > -1) { var obj = {}; obj.date = str.substring(str.indexOf(BEGIN) + BEGIN.length, str.indexOf(END, str.indexOf(BEGIN))); str = str.substring(0, str.indexOf(BEGIN)) + str.substring(str.indexOf(END, str.indexOf(BEGIN)) + BEGIN.length); obj.name = str.substring(str.indexOf(BEGIN) + BEGIN.length, str.indexOf(END, str.indexOf(BEGIN))); str = str.substring(0, str.indexOf(BEGIN)) + str.substring(str.indexOf(END, str.indexOf(BEGIN)) + BEGIN.length); obj.const = str.substring(str.indexOf(BEGIN) + BEGIN.length, str.indexOf(END, str.indexOf(BEGIN))); str = str.substring(0, str.indexOf(BEGIN)) + str.substring(str.indexOf(END, str.indexOf(BEGIN)) + BEGIN.length); objs.push(obj); } console.log(objs);

As others have suggested:正如其他人所建议的:

  1. Create a hidden table创建隐藏表
  2. Populate it with the row data用行数据填充它
  3. Return a mapped JSON array with fields返回带有字段的映射 JSON 数组

 const tableString = ` <tr> <td>01/01/1999</td> <td>Item 1</td> <td>55</td> </tr> <tr> <td>01/01/2000</td> <td>Item 2</td> <td>35</td> </tr> `; console.log(tableRowsToJSON(tableString, ['date', 'name', 'cost'])); function tableRowsToJSON(tableRows, fields) { let table = document.querySelector('.hidden-table'); populateTable(emptyTable(table), tableRows); return Array.from(table.querySelectorAll('tbody tr')).map(tr => { let tds = tr.querySelectorAll('td'); return fields.reduce((obj, field, index) => { return Object.assign(obj, { [field] : tds[index].textContent }); }, {}); }); } function populateTable(table, dataString) { if (table.querySelector('tbody') == null) { table.appendChild(document.createElement('tbody')); } table.querySelector('tbody').innerHTML = dataString; return table; } function emptyTable(table) { let tbody = table.querySelector('tbody'); if (tbody) { while (tbody.hasChildNodes()) { tbody.removeChild(tbody.lastChild); } } return table; }
 .as-console-wrapper { top: 0; max-height: 100% !important; } .hidden-table { display: none; }
 <table class="hidden-table"></table>


As a plugin作为插件

You can call this instead:您可以改为调用它:

let parser = new TableRowParser()
console.log(parser.parse(tableString, ['date', 'name', 'cost']))

 const tableString = ` <tr> <td>01/01/1999</td> <td>Item 1</td> <td>55</td> </tr> <tr> <td>01/01/2000</td> <td>Item 2</td> <td>35</td> </tr> `; class TableRowParser { constructor(config) { this.options = Object.assign({}, TableRowParser.defaults, config) if (document.querySelector('.' + this.options.selector) == null) { let hiddenTable = document.createElement('table') hiddenTable.classList.add(this.options.selector) document.body.appendChild(hiddenTable) } this.tableRef = document.querySelector('.' + this.options.selector) } /* @public */ parse(dataString, fields) { this.__emptyTable() this.__populateTable(dataString) return Array.from(this.tableRef.querySelectorAll('tbody tr')).map(tr => { let tds = tr.querySelectorAll('td') return fields.reduce((obj, field, index) => { return Object.assign(obj, { [field] : tds[index].textContent }) }, {}); }); } /* @private */ __populateTable(dataString) { if (this.tableRef.querySelector('tbody') == null) { this.tableRef.appendChild(document.createElement('tbody')) } this.tableRef.querySelector('tbody').innerHTML = dataString } /* @private */ __emptyTable() { let tbody = this.tableRef.querySelector('tbody') if (tbody) { while (tbody.hasChildNodes()) { tbody.removeChild(tbody.lastChild) } } } } /* @static */ TableRowParser.defaults = { selector : 'hidden-table' } let parser = new TableRowParser() console.log(parser.parse(tableString, ['date', 'name', 'cost']))
 .as-console-wrapper { top: 0; max-height: 100% !important; } .hidden-table { display: none; }

I prefer to use X-ray npm-module for crawling data from html pages.我更喜欢使用X-ray npm-module 从html页面抓取数据。 For example:例如:

const Xray = require('x-ray');
const x = Xray();

let html = `
    <tr>
        <td>01/01/1999</td>
        <td>Item 1</td>
        <td>55</td>
    </tr>
    <tr>
        <td>01/01/2000</td>
        <td>Item 2</td>
        <td>35</td>
    </tr>
`;

x(html, 'tr', [['td']])
    .then(function(res) {
        console.log(res); // prints first result
    });

Which will give you:这会给你:

[ [ '01/01/1999', 'Item 1', '55' ], [ '01/01/2000', 'Item 2', '35' ] ]

So the next step will be iterating over array of arrays and form with it a necessary json , so I guess it won't be a problem according to this question.所以下一步将迭代数组数组并用它形成一个必要的json ,所以我想根据这个问题它不会成为问题。

Also, you could use old table-to-json from converting table-oriented sites right in-to pretty JSON.此外,您可以使用旧的table-to-json将面向表的站点直接转换为漂亮的 JSON。

read the html tag as an XML, the DOM is a XML .将 html 标记作为 XML 读取,DOM 是 XML 。

let tableString = '  <record> '+
                            '  <tr> '+
                                    '<td>01/01/1999</td>'+
                                    '<td>Item 1</td>'+
                                    '<td>55</td>'+
                                '</tr>'+
                                '<tr>'+
                                '   <td>01/01/2000</td>'+
                                '   <td>Item 2</td>'+
                                '   <td>35</td>'+
                                '</tr>'+
                        '  </record> ';


              let source = ( new DOMParser() ).parseFromString( tableString, "application/xml" );

                console.log(source);
                let size = source.childNodes[0].childNodes.length;
                for (let id =0; id< size;id++){
                    let tag = source.childNodes[0].childNodes[id];
                    if(tag.nodeName=='tr'){
                    let tagTr = tag.childNodes;
                        console.log(tagTr[1].textContent);
                        console.log(tagTr[2].textContent);
                        console.log(tagTr[3].textContent);
                    }

                }
                console.log(size);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM