I am trying to fetch some lyrics from genius.com (i know they have an api.I am doing it manually.) but i dont seem to be getting the same html string everytime.In fact i put the code below in a for loop and it seems to be working only %50 of the time.
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import java.io.IOException;
public class Fetch_lyrics {
public static void testing() {
try {
String urll = "https://genius.com/In-mourning-debris-lyrics";;
Document doc = Jsoup.connect(urll).maxBodySize(0).get();
String text = doc.select("p").first().toString();
System.out.println(text);
} catch (IOException e) {
e.printStackTrace();
}
}
}
I printed the raw html via doc variable and it seems that around 50% of the time the raw html string doesn't have the <p>
class(idk if it's called class or something else) that contains the lyrics.Thanks in advance.
Looks like genius.com returns different content for new users. I got two different contents when I came for the first time and when I cleared cookies in a browser (Chrome) and went again.
I recommend you add two selectors to get the information you need.
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import java.io.IOException;
class Outer {
public static void main(String[] args) {
try {
String urll = "https://genius.com/In-mourning-debris-lyrics";
Document doc = Jsoup.connect(urll).maxBodySize(0).get();
Element first = doc.selectFirst("p");
if (first == null) {
first = doc.selectFirst("div[class^=Lyrics__Container]");
}
if (first != null) {
System.out.println(first.text());
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.