简体   繁体   中英

Skipping TR elements when using selenium python

I am using selenium python to scrape a webpage. I want to skip the first two TR elements in the table because they are the header and titles. Is there a way in Selenium or a pythonic way to skip the first two TR elements?

I have tried using the specific x-path of the TR I want to start on however it doesn't pull all the TRs just the specific one.

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
import statistics
import requests
import json
import numpy as np
import statistics
import pandas as pd
import xlsxwriter

browser = webdriver.Chrome("/ProgramData/chocolatey/bin/chromedriver.exe")


browser.get(
    "http://rotoguru1.com/cgi-bin/hyday.pl?mon=10&day=22&year=2019&game=fd")

table_rows = browser.find_element_by_xpath(
    '/html/body/table/tbody/tr/td[3]/table[4]').find_element_by_tag_name('tbody').find_elements_by_tag_name('tr')

players = []

for row in table_rows:
    cells = row.find_elements_by_tag_name('td')
    pos = cells[0].text
    print(pos)
    name = cells[1].text
    print(name)
    fpts = cells[2].text
    salary = cells[3].text
    team = cells[4].text
    opp = cells[5].text
    minutes = cells[7].text
    players.append([pos, name, fpts, salary, team, opp, minutes])

df = pd.DataFrame(players, columns=[
    "Position", "Name", "FPTS", "Salary", "Team", "Opponent", "Minutes"])
writer = pd.ExcelWriter('NBA_Stats', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1')
df.style.set_properties(**{'text-align': 'center'})
pd.set_option('display.max_colwidth', 100)
pd.set_option('display.width', 1000)
print(players)
writer.save()

In order to skip the first two lines just change your for loop to:

for r, row in enumerate(table_rows):
    if r < 2:
        continue

and keep the rest unchanged

Can you please check if below xpath is working for you?

//body//table[4]/tbody//tr[not(position()=1)][not(position()=1)]

find_elements_by_tag_name() returns a list, so you can use any regular list operation on it. For example, you can slice the list:

for row in table_rows[2:]:

This will skip the first two rows.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM