简体   繁体   中英

Scrape table using selenium in java

I am Scrapping transactions of an account from a transaction table which has a following format:- Table format

if i know the number of rows then i can loop over it and get the required data by using seperate locators for each of the field

how i scrape these whole table, beacause i don't know how many transaction will be there, i need something through which i can loop over it and scrape the transactions. i am using selenium for scrapping in java.

Here is the HTML of transaction table :-

<div id="txn-display"> 
<!-- Transactions start  -->
<!--#include virtual="mobile-statement.shtml" -->
    <table id="txn-display-table">
        <thead>
            <tr>
                <th>Date</th>
                <th colspan="2">Description</th>
                <th>Type</th>
                <th class="amount-cell">Amount Spent  (<em class="WebRupee">Rs.</em>)</th>                        
            </tr>
        </thead>
        <tbody>
            <tr class="gridEven">
                <td>12/02/2019</td>
                <td colspan="2" class="word-break">INTERGLOBE AVIATION LT .             IND</td>
                <td class="txn-type">Debit</td>
                <td class="amount-cell">320</td>                        
            </tr>
            <tr class="gridOdd">
                <td>27/01/2019</td>
                <td colspan="2" class="word-break">PETROL TRXN FEE RVRSL EXCLUDING TAX</td>
                <td class="txn-type">Credit</td>
                <td class="amount-cell">8.21</td>                       
            </tr>
            <tr class="gridEven">
                <td>27/01/2019</td>
                <td colspan="2" class="word-break">SHELL R K R ENTERPRISE BANGALORE     IND</td>
                <td class="txn-type">Debit</td>
                <td class="amount-cell">831.06</td>                     
            </tr>
        </tbody>
    </table>        
</div>

You haven't post the HTML hence taking my own example where i am iterating using row and col count please check and let us know if you have any doubts..

package Testng_Pack;

import java.util.concurrent.TimeUnit;

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.firefox.FirefoxDriver;
import org.testng.annotations.AfterTest;
import org.testng.annotations.BeforeTest;
import org.testng.annotations.Test;

public class table { 

 WebDriver driver = null;
 @BeforeTest
    public void setup() throws Exception { 
         System.setProperty("webdriver.gecko.driver", "D:\\Selenium Files\\geckodriver.exe");
  driver = new FirefoxDriver();
         driver.manage().window().maximize();
         driver.manage().timeouts().implicitlyWait(15, TimeUnit.SECONDS);
         driver.get("Pass the URL here"); 
    } 

  @AfterTest
 public void tearDown() throws Exception { 
   driver.quit();
     } 

 @Test
 public void print_data(){

 //Get number of rows In table.
 int Row_count = driver.findElements(By.xpath("//*[@id='post-body-6522850981930750493']/div[1]/table/tbody/tr")).size();
 System.out.println("Number Of Rows = "+Row_count);

 //Get number of columns In table.
 int Col_count = driver.findElements(By.xpath("//*[@id='post-body-6522850981930750493']/div[1]/table/tbody/tr[1]/td")).size();
 System.out.println("Number Of Columns = "+Col_count);

 //divided xpath In three parts to pass Row_count and Col_count values.
 String first_part = "//*[@id='post-body-6522850981930750493']/div[1]/table/tbody/tr[";
 String second_part = "]/td[";
 String third_part = "]";

 //Used for loop for number of rows.
 for (int i=1; i<=Row_count; i++){
  //Used for loop for number of columns.
  for(int j=1; j<=Col_count; j++){
   //Prepared final xpath of specific cell as per values of i and j.
   String final_xpath = first_part+i+second_part+j+third_part;
   //Will retrieve value from located cell and print It.
   String Table_data = driver.findElement(By.xpath(final_xpath)).getText();
   System.out.print(Table_data +"  ");   
  }
   System.out.println("");
   System.out.println("");  
 } 
 }
}

Below mentioned code will automatically count the rows and columns mentioned in table. It will work for table with tr and td tagnames. You just have to pass the web table xpath to the code.

@Test 
public void testWebTable()  { 
WebElement simpleTable = driver.findElement(By.xpath("//table[@id='txn-display-table']//tbody")); 

    // Get all rows 
    List<WebElement> rows = simpleTable.findElements(By.tagName("tr")); 
    Assert.assertEquals(rows.size(),4); 

    // Print data from each row 
    for (WebElement row : rows) { 
        List<WebElement> cols = row.findElements(By.tagName("td")); 
        for (WebElement col : cols) {
             System.out.print(col.getText() + "\t"); 
           } System.out.println(); 
       }
    }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM