简体   繁体   English

如何从纯文本中提取HTML标记

[英]How to extract html markup from plain texts

I have bunch of html data in plain texts that I got from CKEditor. 我有一堆从CKEditor获得的纯文本html数据。 It contains table structures and bunch of html markup. 它包含表结构和一堆html标记。

I was wondering if there is a way to extract table structure and td data only. 我想知道是否有一种方法只能提取表结构和td数据。

The plain texts could be something like 纯文本可能像

first table....bunch more texts here...

<table>
   <tr><td>  data1  </td></tr>
   <tr><td>  data2  </td></tr>
   <tr><td>  data3  </td></tr>
</table>

end of table. test data here...

<table>
   <tr><td>  data4  </td></tr>
   <tr><td>  data5  </td></tr>
   <tr><td>  data6  </td></tr>
</table>

end of second table and bunch more texts....

I have tried 我努力了

//tableData contains everything the user type in CKEditor.
var table = tableData.getElementsByTagName ('table');

but i just realized these texts are no in dom . 但我只是意识到这些文本是在没有dom They are just plain texts I extract from CKEditor . 它们只是我从CKEditor提取的纯文本。

How do I extract these table data? 如何提取这些table数据?

Thanks for the help! 谢谢您的帮助!

I assume that you have a tableData string from which you want to extract DOM nodes to be able to work on them. 我假设您有一个tableData字符串,您想从中提取DOM节点以对其进行处理。

To avoid parsing you can insert this string into temporary DOM element. 为了避免解析,您可以将此字符串插入临时DOM元素中。

var temp = document.createElement( 'div' );
div.innerHTML = tableData;

// Retrieve all tables.
var tables = div.getElementsByTagName( 'table' );

You want something like the following which pulls all of the tables into an array and then iterates over them. 您需要类似以下内容的东西,将所有表拉入数组,然后在它们上进行迭代。 Here is a fiddle that shows this in action as well http://jsfiddle.net/M5nMY/ . 这是一个小提琴,它也在http://jsfiddle.net/M5nMY/中显示了这一点。 I am assuming here that tabledata is the id of the DOM element containing the tables. 我在这里假设tabledata是包含表的DOM元素的ID。

var tableData = document.getElementById('tabledata');
var tables = tableData.getElementsByTagName('table');
var data = new Array();
for(var k = 0; k < tables.length; k++){
   var table = tables[k];
   for(var i=0; i< table.rows.length; i++){
      var row = table.rows[i].cells;
      for(var j=0; j < row.length; j++){
        data.push(row[j].innerHTML);
      }
   }
}

Note in this case, I have pushed all of the data from all of the tables into a single array one row at a time. 请注意,在这种情况下,我将所有表中的所有数据一次推入一个数组中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM