简体   繁体   English

儿童的JSoup格式输出

[英]JSoup format ouput of children

I am parsing a document with many entries like this (with JSoup) 我正在解析带有许多这样的条目的文档(使用JSoup)

 <span class="chart_position position-up position-greatest-gains">1</span>
            <h1>Beauty And A Beat</h1>
        <p class="chart_info">
      <a href="/artist/305459/justin-bieber">Justin Bieber Featuring Nicki Minaj</a>            <br>
      Beauty and a Beat          </p>

I can extract two seperate lists of title and artist of the song, like this: 我可以提取歌曲的标题和艺术家的两个单独的列表,如下所示:

    Elements song = doc.select("div.chart_listing h1");
    System.out.println("song: " + song);

    Elements li = doc.select("p.chart_info a");
    System.out.println("artists: " + li.text());

However, now the output looks like this: 但是,现在的输出如下所示:

<h1>The Lucky Ones</h1>
<h1>Scream &amp; Shout</h1>
<h1>Clarity</h1>
<h1>We Are Young</h1>
<h1>Va Va Voom</h1>
<h1>Catch My Breath</h1>
<h1>I Found You</h1>
<h1>Sorry</h1>
<h1>Leaving</h1>
artists: Justin Bieber Featuring Nicki Minaj Kerli will.i.am & Britney Spears Nicki Minaj Kelly Clarkson The Wanted Ciara Pet Shop Boys

And I would like it to look like: 我希望它看起来像:

1 - Song - Artist
2 - Song - Artist
etc

I have been looking at related posts and tried this, but I have not quite figured it out: 我一直在寻找相关的帖子并尝试过,但是我还不太清楚:

        Elements parents = doc.select("div.chart_listing h1");
        for (Element parent : parents)
        {
            Elements categories = parent.select("p.chart_info a");
            System.out.print("song: " + parent + " - ");
            System.out.print("artist: " + categories.text() + "\n");
        }

This currently outputs a blank song, like this: 当前会输出一首空白歌曲,如下所示:

song: <h1>Beauty And A Beat</h1> - artist: 
song: <h1>The Lucky Ones</h1> - artist: 
song: <h1>Scream &amp; Shout</h1> - artist: 

Two main questions remain open: 两个主要问题仍然悬而未决:

  • How do I print the artist belonging to the song? 如何打印歌曲的歌手? Why is it blanc? 为什么是空白?
  • How do I add the numbering (this is secondary, but would be nice) 我该如何添加编号(这是次要的,但是会很好)

Thanks so much! 非常感谢!

---EDIT - -编辑

Solved the first problem by using a larger parent: 通过使用更大的父代解决了第一个问题:

    Elements parents = doc.select("article.song_review");
    for (Element parent : parents)
    {
        Elements titles = parent.select("h1");
        Elements categories = parent.select("p.chart_info a");
        System.out.print("song: " + titles + " - ");
        System.out.print("artist: " + categories.text() + "\n");
    }

Now output looks like this: 现在输出如下所示:

song: <h1>Beauty And A Beat</h1> - artist: Justin Bieber Featuring Nicki Minaj
song: <h1>The Lucky Ones</h1> - artist: Kerli
song: <h1>Scream &amp; Shout</h1> - artist: will.i.am & Britney Spears

Any idea on how to clear up the %amp; 关于如何清除%amp的任何想法; and add numbering? 并添加编号?

Solved it like this: 像这样解决它:

    Elements parents = doc.select("article.song_review");
    for (Element parent : parents)
    {
        Elements position = parent.select("span.chart_position");
        Elements titles = parent.select("h1");
        Elements categories = parent.select("p.chart_info a");
        System.out.print("position: " + position.text() + " - song: " + titles.text() + " - ");
        System.out.print("artist: " + categories.text() + "\n");
    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM