简体   繁体   English

将HTML字符串解析为数组

[英]Parse HTML String to Array

I have an html string that contains multiple <p> tags. 我有一个包含多个<p>标签的html字符串。 WIthin each <p> tag there is a word and its definition. 每个<p>标签都有一个单词及其定义。

let data = "<p><strong>Word 1:</strong> Definition of word 1</p><p><strong>Word 2:</strong> Definition of word 2</p>"

My goal is to convert this html string into an array of objects that looks like below: 我的目标是将此html字符串转换为如下所示的对象数组:

[
 {"word": "Word 1", "definition": "Definition of word 1"},
 {"word": "Word 2", "definition": "Definition of word 2"}
]

I am doing it as follows: 我这样做如下:

var parser = new DOMParser();
  var parsedHtml    = parser.parseFromString(data, "text/html");
  let pTags = parsedHtml.getElementsByTagName("p");
  let vocab = []
  pTags.forEach(function(item){
    // This is where I need help to split and convert item into object
    vocab.push(item.innerHTML)
  });

As you can see the comment in the above code, that is where I'm stuck. 你可以在上面的代码中看到评论,这就是我被困住的地方。 Any help is appreciated. 任何帮助表示赞赏。

Use textContent to get the text out of an element. 使用textContent从文本中获取文本。 The word is in the strong child element, the definition is the rest of the text. 这个词在strong子元素中,定义是文本的其余部分。

var parser = new DomParser();
  var parsedHtml    = parser.parseFromString(data, "text/html");
  let pTags = parsedHtml.getElementsByTagName("p");
  let vocab = []
  pTags.forEach(function(item){
    let word = item.getElementsByTagName("strong")[0].textContent.trim();
    let allText = item.textContent;
    let definition = allText.replace(word, "").trim();
    vocab.push({word: word, definition: definition})
  });

A bit adhoc but works. 有点adhoc但有效。

 const data = "<p><strong>Word 1:</strong> Definition of word 1</p><p><strong>Word 2:</strong> Definition of word 2</p>"; const parsedData = [ { "word1": data.split('<strong>')[1].split('</strong>')[0].trim(), "definition": data.split('</strong>')[1].split('</p>')[0].trim() }, { "word2": data.split('</p>')[1].split('<strong>')[1].split('</strong>')[0].trim(), "definition": data.split('</p>')[1].split('</strong>')[1].split('</p>')[0].trim() } ] console.log(parsedData); 

You should fix: 你应该修复:

  • DOMParser , not DomParser DOMParser ,而不是DomParser
  • pTags cannot use .forEach() , please use for loop pTags不能使用.forEach() ,请使用for循环

My solution for your problem: 我的问题解决方案:

 let data = "<p><strong>Word 1:</strong> Definition of word 1</p><p><strong>Word 2:</strong> Definition of word 2</p>" var parser = new DOMParser(); var parsedHtml = parser.parseFromString(data, "text/html"); let pTags = parsedHtml.getElementsByTagName("p"); let vocab = []; for (let p of pTags) { const word = p.getElementsByTagName('strong')[0].innerHTML.replace(':', '').trim(); const definition = p.innerHTML.replace(/<strong>.*<\\/strong>/, '').trim(); vocab.push( { word, definition } ) } console.log(vocab); 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM