[英]Unable to parse img and name from amazon or flipkart pages using Jsoup
I am unable to get main image and name for products at Amazon or Flipkart using Jsoup. 我无法使用Jsoup获得Amazon或Flipkart产品的主要图像和名称。
My java/jsoup code for the same is: 我的java / jsoup代码相同:
// For amazon
Connection connection = Jsoup.connect(url).timeout(5000).maxBodySize(1024*1024*10);
Document doc = connection.get();
Elements imgs = doc.select("img#landingImage");
Elements names = doc.select("span#productTitle");
// For flipkart
Connection connection = Jsoup.connect(url).timeout(5000).maxBodySize(1024*1024*10);
Document doc = connection.get();
Elements imgs = doc.select("h1.title");
Elements names = doc.select("img.productImage.current");
Can someone please point out what am I missing here? 有人可以指出我在这里想念什么吗?
URLs I have used are: 我使用的网址是:
http://www.flipkart.com/lenovo-yoga-2-tablet-android-10-inch/p/itmeyqkznqa2zjf5?pid=TABEYQKXWAXMSGER&srno=b_2&offer=ExchangeOffer_LenovoYoga.&ref=9ea008ab-ae95-4f52-8ef7-3ef1a54947ae http://www.flipkart.com/lenovo-yoga-2-tablet-android-10-inch/p/itmeyqkznqa2zjf5?pid=TABEYQKXWAXMSGER&srno=b_2&offer=ExchangeOffer_LenovoYoga.&ref=9ea008ab-ae95-4f52-8ef7-3ef1a54947ae
and 和
http://www.amazon.com/gp/product/B00LZGBU3Y/ref=s9_psimh_gw_p504_d0_i5?pf_rd_m=ATVPDKIKX0DER&pf_rd_s=desktop-1&pf_rd_r=0ESK1KNE31TBRVC8115Q&pf_rd_t=36701&pf_rd_p=1970559082&pf_rd_i=desktop http://www.amazon.com/gp/product/B00LZGBU3Y/ref=s9_psimh_gw_p504_d0_i5?pf_rd_m=ATVPDKIKX0DER&pf_rd_s=desktop-1&pf_rd_r=0ESK1KNE31TBRVC8115Q&pf_rd_t=36701&pf_rd_p=1970559082&pf_rd_i=desktop
Also, I would like to do this parsing on the front end if possible using javascript and jquery. 另外,如果可能,我想使用javascript和jquery在前端进行此解析。
Is there a way to do the same? 有没有办法做同样的事情?
Found out the issue. 找出问题所在。
Jsoup in GAE works when we use the URL fetch service using java.net.URL as: 当我们使用java.net.URL的URL提取服务时,GAE中的Jsoup可以工作:
private String read(String url) throws IOException
{
URL urlObj = new URL(url);
BufferedReader reader = new BufferedReader(new InputStreamReader(urlObj .openStream()));
String line;
StringBuffer sbuf = new StringBuffer();
while ((line = reader.readLine()) != null) {
if (line.trim().length() > 0)
sbuf.append(line).append("\n");
}
reader.close();
return sbuf.toString();
}
And then you use regular Jsoup as: 然后将常规的Jsoup用作:
String html = read(url);
Document doc = Jsoup.parse(html);
Doing the above works very well. 进行上述操作效果很好。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.