简体   繁体   English

如何使用jquery选择器使用兄弟标签构建分层对象

[英]How to build hierarchical objects with siblings tags using jquery selectors

I have the below html snippet. 我有以下html片段。 I want to web scraping the page to get the topics and subtopics and store it in objects. 我想通过网络抓取页面以获取主题和子主题并将其存储在对象中。

the desired result is something: 所需的结果是:

{
'topic': 'Java Basics', 
'subtopics':['Define the scope of variables', 'Define the structure of a Java class', ...]
}

I trying to make it work with Jsdom for Node.js and JQuery: 我试图使其与Jsdom for Node.js和JQuery一起使用:

var jsdom = require('jsdom');
var fs = require("fs");


var topicos = fs.readFileSync("topic.html", "utf-8");

    jsdom.env(topicos, ["http://code.jquery.com/jquery.js"], function (error, window) {
        var $ = window.$;
        var length = $('div ~ ').each(function () {
            //???
            var topic = $(this);
            var text = topic.text();                 
            console.log(text.trim())
        });
    })

but due to my lack of experience in jQuery, I am not able to organize the hierarchy properly. 但是由于缺乏jQuery的经验,我无法正确组织层次结构。

Html snippet: HTML片段:

<div>
    <strong>Java Basics&nbsp;</strong></div>
<ul>
    <li>
        Define the scope of variables&nbsp;</li>
    <li>
        Define the structure of a Java class
    </li>
    <li>
        Create executable Java applications with a main method; run a Java program from the command line; including
        console output.
    </li>
    <li>
        Import other Java packages to make them accessible in your code
    </li>
    <li>
        Compare and contrast the features and components of Java such as:
        platform independence, object orientation, encapsulation, etc.
    </li>
</ul>
<div>
    <strong>Working With Java Data Types&nbsp;</strong></div>
<ul>
    <li>
        Declare and initialize variables (including casting of primitive data types)
    </li>
    <li>
        Differentiate between object reference variables and primitive variables
    </li>
    <li>
        Know how to read or write to object fields
    </li>
    <li>
        Explain an Object's Lifecycle (creation, "dereference by reassignment" and garbage collection)
    </li>
    <li>
        Develop code that uses wrapper classes such as Boolean, Double, and Integer. &nbsp;</li>
</ul>
 ...

Here is working snippet fiddle 这是工作片段提琴

var topicos = [];

jQuery('div').each(function(){
var data = {};
var jThis = jQuery(this);
  data.topic = jThis.find('strong').text();
  data.subtopics = [];
  jThis.next('ul').find('li').each(function(){
  var jThis = jQuery(this);
    data.subtopics.push(jThis.text());
  });
topicos.push(data);
});

console.log(topicos);

But I would highly recommend to add classes to your markup and use them as selectors instead of tag-names: 但我强烈建议您将类添加到您的标记中,并将其用作选择器而不是标记名:

<div class="js-topic-data">
  <div>
    <strong class="js-topic">Java Basics&nbsp;</strong>
  </div>
  <ul>
    <li class="js-sub-topic">
       Define the scope of variables&nbsp;</li>
    <li>
  </ul>
</div>

Then you could do something like: 然后,您可以执行以下操作:

jQuery('.js-topic-data').each(function(){
var data = {};
var jThis = jQuery(this);
  data.topic = jThis.find('.js-topic').text();
  data.subtopics = [];
  jThis.next('.js-sub-topic').each(function(){
  var jThis = jQuery(this);
    data.subtopics.push(jThis.text());
  });
topicos.push(data);
});

which is much more robust for markup changes etc 这对于标记更改等更为健壮

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM