[英]Pagination in web scraping using jsoup in java swing
private void EducationWorld_Webscrap_jButtonActionPerformed(java.awt.event.ActionEvent evt)
{
try
{
Document doc=Jsoup.connect("http://www.educationworld.in/institution/mumbai/schools").userAgent("Mozilla/17.0").get();
Elements links=doc.select("div.instnm.litblue_bg");
StringBuilder sb1 = new StringBuilder ();
links.stream().forEach(e->sb1.append(e.text()).append(System.getProperty("line.separator")));
jTextArea1.setText(sb1.toString());
}
catch(Exception e)
{
JOptionPane.showMessageDialog(null, e);
}
}
這是顯示數據。 但是有分頁。 如何獲取接下來五頁的數據?
幸運的是,我已經實現了您所追求的目標,正如您在下面的代碼塊中所見。 如果您不確定發生了什么,我已經添加了希望描述每個步驟的注釋。
我嘗試使用站點的分頁設置,但它們似乎只允許每個請求增加 5 個結果,因此沒有太多余地,您需要通過起點才能檢索下 5 個結果。
因此,我不得不將它包含在循環32
次的fori
中。 等於158
所學校,除以5
等於31.6
或四舍五入32
當然,如果您只想要前5
頁,您可以將循環更改為僅循環5
次。
無論如何,多汁一點;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;
import java.io.*;
import java.net.*;
public class Loop
{
public static void main( String[] args )
{
final StringBuilder sb1 = new StringBuilder();
BufferedReader bufferedReader = null;
OutputStream outputStream = null;
try
{
// Parameter pagination counts
int startCount = 0;
int limitCount = 5;
// Loop 32 times, 158 schools / 5 (pagination amount)
for( int i = 0; i < 32; i++ )
{
// Open a connection to the supplied URL
final URLConnection urlConnection = new URL( "http://www.educationworld.in/institution/mumbai/schools" ).openConnection();
// Tell the URL we are sending output
urlConnection.setDoOutput( true );
// The stream we will be writing to the URL
outputStream = urlConnection.getOutputStream();
// Setup parameters for pagination
final String params = "qstart=" + startCount + "&limit=" + limitCount;
// Get the bytes of the pagination parameters
final byte[] outputInBytes = params.getBytes( "UTF-8" );
// Write the bytes to the URL
outputStream.write( outputInBytes );
// Get and read the URL response
bufferedReader = new BufferedReader( new InputStreamReader( urlConnection.getInputStream() ) );
StringBuilder response = new StringBuilder();
String inputLine;
// Loop over the response and read each line appending it to the StringBuilder
while( (inputLine = bufferedReader.readLine()) != null )
{
response.append( inputLine );
}
// Do the same as before just with a String instead
final Document doc = Jsoup.parse( response.toString() );
Elements links = doc.select( "div.instnm.litblue_bg" );
links.forEach( e -> sb1.append( e.text() ).append( System.getProperty( "line.separator" ) ) );
// Increment the pagination parameters
startCount += 5;
limitCount += 5;
}
System.out.println( sb1.toString() );
jTextArea1.setText(sb1.toString());
}
catch( Exception e )
{
e.printStackTrace();
}
finally
{
try
{
// Close the bufferedReader
if( bufferedReader != null )
{
bufferedReader.close();
}
// Close the outputStream
if( outputStream != null )
{
outputStream.close();
}
}
catch( IOException e )
{
e.printStackTrace();
}
}
}
}
希望這會有所幫助,並且您會得到想要的結果,如果您需要任何描述,請詢問!
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.