Reliable alternative to `time.sleep` [As job does not scrape data without it]

Question

I have a job that tends to work with time.sleep. However, I'm wanting a quicker method than time.sleep(2) as this is slow and would not work where there is slow internet or on my laptop which is slow.

Full code here.

The job works for:

indexes = [index for index in range(len(options))]
shuffle(indexes)
for index in indexes:
    time.sleep(5)
    driver.get('https://www.bet365.com.au/#/AS/B1/')
    clickMe = wait(driver, 10).until(EC.element_to_be_clickable((By.XPATH,'(//div[div/div/text()="Main Lists"]//div[starts-with(@class, "sm-CouponLink_Label") and normalize-space()])[%s]' % str(index + 1))))
    clickMe.click()
    time.sleep(3)

Changing time.sleep to 0 means the job just finishes successfully [no scraping or actions performed].

Unfortunately,

EC.presence_of_element_located((By.css_selector, "#TopPromotionBetNow"))
WebDriverWait(driver, timeout).until(element_present)

Is giving me an error.

clickMe = wait(driver, 10).until(EC.element_to_be_clickable((By.XPATH,'(//div[div/div/text()="Main Lists"]//div[starts-with(@class, "sm-CouponLink_Label") and normalize-space()

Does not seem to have an effect,

Any ideas on how I can make it so the job will scrape, navigate, click successfully so that the page is fully loaded?

Answer 1

您可以使用visible_of_all_elements_located作为预期条件

langs2 = wait(driver, 10).until(EC.visibility_of_all_elements_located((By.XPATH, '//a[contains(@class, "tb_header-bar tb_")]')))

Answer 2

There's no need to scrape/parse the page. The website has an API to request the data directly. You can inspect these requests with devtools (F12) when the page is loading.

To list all the markets with the England Premier League:

import requests

URI_COMPETITIONS = "https://services.topbetta.com.au/api/v2/combined/sports/competitions?sport_name=football"
URI_EVENTS = "https://services.topbetta.com.au/api/v2/combined/events/markets/selections?competition_id=%s"

response = requests.get(URI_COMPETITIONS).json()

for sport in response['data'] :
  if sport['name'] == 'Football':

    for base_competition in sport['base_competitions'] :
      if base_competition['name'] == 'England Premier League':

        for info_competition in base_competition['competitions'] :
          response = requests.get(URI_EVENTS % info_competition['id']).json()

          for competition in response['data'] :
            print('%s' % competition['name'])

            for event in competition['events'] :
              print("  %s  %s" % (event['start_date'], event['name']))

              for market in event['markets']:                
                for selection in market['selections'] :
                  print("  %s  %s" % (selection['name'], selection['price']))

Which gives:

England Premier League Round 26
  2018-02-06 07:00:00  Watford v Chelsea
    Watford  6
    Draw  3.8
    Chelsea  1.6
England Premier League Round 27
  2018-02-11 02:00:00  Everton v Crystal Palace
    Everton  2.4
    Draw  3.2
    Crystal Palace  3
  2018-02-11 23:00:00  Huddersfield Town v AFC Bournemouth
    Huddersfield Town  3
    Draw  3.2
    AFC Bournemouth  2.4
  2018-02-11 04:30:00  Manchester City v Leicester City
    Manchester City  1.2
    Draw  6.5
    Leicester City  13
...

Reliable alternative to `time.sleep` [As job does not scrape data without it]

Question

2 answers

solution1
0 2018-02-05 05:27:02

solution2
0 2018-02-05 09:42:17

Reliable alternative to `time.sleep` [As job does not scrape data without it]

Question

2 answers

solution1 0 2018-02-05 05:27:02

solution2 0 2018-02-05 09:42:17

solution1
0 2018-02-05 05:27:02

solution2
0 2018-02-05 09:42:17