private void EducationWorld_Webscrap_jButtonActionPerformed(java.awt.event.ActionEvent evt)
{
try
{
Document doc=Jsoup.connect("http://www.educationworld.in/institution/mumbai/schools").userAgent("Mozilla/17.0").get();
Elements links=doc.select("div.instnm.litblue_bg");
StringBuilder sb1 = new StringBuilder ();
links.stream().forEach(e->sb1.append(e.text()).append(System.getProperty("line.separator")));
jTextArea1.setText(sb1.toString());
}
catch(Exception e)
{
JOptionPane.showMessageDialog(null, e);
}
}
This is showing data. But there is pagination. How to fetch data of next five pages?
Fortunately I've achieved what you're after, as you can see in the code block below. I've added comments that hopefully describe each step if you were not sure what is happening.
I tried playing around with the pagination settings of the site but they seem to only allow increments of 5 results per request so there wasn't much leeway, and you need to pass the starting point before it can retrieve the next 5 results.
Therefore, I've had to include it in a fori
that loops 32
times. Which equates to 158
schools, divided by 5
equals 31.6
or rounded 32
Of course, if you only want the first 5
pages you can change the loop to loop only 5
times.
Anyway on to the juicy bit;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;
import java.io.*;
import java.net.*;
public class Loop
{
public static void main( String[] args )
{
final StringBuilder sb1 = new StringBuilder();
BufferedReader bufferedReader = null;
OutputStream outputStream = null;
try
{
// Parameter pagination counts
int startCount = 0;
int limitCount = 5;
// Loop 32 times, 158 schools / 5 (pagination amount)
for( int i = 0; i < 32; i++ )
{
// Open a connection to the supplied URL
final URLConnection urlConnection = new URL( "http://www.educationworld.in/institution/mumbai/schools" ).openConnection();
// Tell the URL we are sending output
urlConnection.setDoOutput( true );
// The stream we will be writing to the URL
outputStream = urlConnection.getOutputStream();
// Setup parameters for pagination
final String params = "qstart=" + startCount + "&limit=" + limitCount;
// Get the bytes of the pagination parameters
final byte[] outputInBytes = params.getBytes( "UTF-8" );
// Write the bytes to the URL
outputStream.write( outputInBytes );
// Get and read the URL response
bufferedReader = new BufferedReader( new InputStreamReader( urlConnection.getInputStream() ) );
StringBuilder response = new StringBuilder();
String inputLine;
// Loop over the response and read each line appending it to the StringBuilder
while( (inputLine = bufferedReader.readLine()) != null )
{
response.append( inputLine );
}
// Do the same as before just with a String instead
final Document doc = Jsoup.parse( response.toString() );
Elements links = doc.select( "div.instnm.litblue_bg" );
links.forEach( e -> sb1.append( e.text() ).append( System.getProperty( "line.separator" ) ) );
// Increment the pagination parameters
startCount += 5;
limitCount += 5;
}
System.out.println( sb1.toString() );
jTextArea1.setText(sb1.toString());
}
catch( Exception e )
{
e.printStackTrace();
}
finally
{
try
{
// Close the bufferedReader
if( bufferedReader != null )
{
bufferedReader.close();
}
// Close the outputStream
if( outputStream != null )
{
outputStream.close();
}
}
catch( IOException e )
{
e.printStackTrace();
}
}
}
}
Hopefully this helps and you get the outcome you want, if you require anything describing just ask!
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.