Jsoup 未完全获取原始 html 代码

Question

我正在尝试从天才那里获取一些歌词。com（我知道他们有一个 api。我正在手动操作。）但我似乎每次都没有得到相同的 ZFC35FDC70D5FC69D2698883A822C7A53E 字符串。它似乎只在 %50 的时间内工作。

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import java.io.IOException;

public class Fetch_lyrics {
    public static void testing() {
        try {

            String urll = "https://genius.com/In-mourning-debris-lyrics";;
            Document doc = Jsoup.connect(urll).maxBodySize(0).get();
            String text = doc.select("p").first().toString();
            System.out.println(text);

        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

我通过 doc 变量打印了原始 html ，似乎大约 50% 的时间原始 html 字符串没有<p>类（如果它提前被称为 ZA2F2ED4F8EBC2CBB14C21A29DC40 或其他内容，则为 idk）。谢谢6谢谢。

Answer 1

看起来像天才。com 为新用户返回不同的内容。 我第一次来时得到了两个不同的内容，当我在浏览器（Chrome）中清除 cookies 并再次访问时，我得到了两个不同的内容。

我建议您添加两个选择器来获取您需要的信息。

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;

import java.io.IOException;

class Outer {

    public static void main(String[] args) {
        try {
            String urll = "https://genius.com/In-mourning-debris-lyrics";
            Document doc = Jsoup.connect(urll).maxBodySize(0).get();
            Element first = doc.selectFirst("p");
            if (first == null) {
                first = doc.selectFirst("div[class^=Lyrics__Container]");
            }
            if (first != null) {
                System.out.println(first.text());
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

Jsoup 未完全获取原始 html 代码

问题描述

1 个解决方案

解决方案1
0 已采纳 2021-01-24 20:36:49

Jsoup 未完全获取原始 html 代码

问题描述

1 个解决方案

解决方案1 0 已采纳 2021-01-24 20:36:49

解决方案1
0 已采纳 2021-01-24 20:36:49