Can't parse phone numbers from a webpage using requests module

Question

I'm trying to find any way to scrape phone numbers from a webpage using requests module. I've got success using selenium but I wish to achieve the same using requests module. I tried a lot to find any clue using chrome dev tools observing netwrok activity but I failed miserably. In case you would like to know how I did it using selenium, I thought to paste the selenium script.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

url = 'https://www.numberbarn.com/search?state=New%20Jersey'

with webdriver.Chrome() as driver:
    driver.get(url)
    for item in WebDriverWait(driver,10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".results-list .container"))):
        phone = WebDriverWait(item,10).until(EC.presence_of_element_located((By.CSS_SELECTOR, ".telephone-number"))).text
        print(phone)

How can I parse phone numbers from the above webpage using requests module?

Answer 1

Requests is a module that will do raw GETs of a URI; in this case it will fetch the HTML of that webpage.

If you open that webpage in a browser and view it with Developer tools, you will see none of those phone numbers are actually in the HTML, so Requests (fetch) + XPATH (parse), or tools like Scrapy, probably do not help you. It's basically just a Javascript blob:

  <meta name="twitter:domain" content="numberbarn.com">
  <base href="/">
  <link id="favicon" rel="icon" type="image/x-icon">
  <script async src="//www.googletagmanager.com/gtag/js"></script>
  <script>
    window.dataLayer = window.dataLayer || [];
    function gtag(){dataLayer.push(arguments);}
    gtag('js', new Date());
  </script>
<link rel="stylesheet" href="/angular/styles.d7cd4c8476c1236343ec.css"></head>
<body>
<app-root></app-root>
<link id="brand-stylesheet" rel="stylesheet"/>
<script src="//browser.sentry-cdn.com/5.17.0/bundle.min.js" integrity="sha384-lowBFC6YTkvMIWPORr7+TERnCkZdo5ab00oH5NkFLeQUAmBTLGwJpFjF6djuxJ/5" crossorigin="anonymous"></script>
<script src="/angular/runtime-es2015.cd8b7003cdbc6c84c9fd.js" type="module"></script><script src="/angular/runtime-es5.cd8b7003cdbc6c84c9fd.js" nomodule defer></script><script src="/angular/polyfills-es5.3c509d0a8908a60997e3.js" nomodule defer></script><script src="/angular/polyfills-es2015.ce03948e69242dd06dc0.js" type="module"></script><script src="/angular/vendor-es2015.a7e86119a8ea99d5add3.js" type="module"></script><script src="/angular/vendor-es5.a7e86119a8ea99d5add3.js" nomodule defer></script><script src="/angular/main-es2015.e16ee0657047312eb515.js" type="module"></script><script src="/angular/main-es5.e16ee0657047312eb515.js" nomodule defer></script></body>
</html>

You can also see this with:

curl "https://www.numberbarn.com/search?state=New%20Jersey" > blob.html

and opening blob.html in a text editor.

You really do need something like Selenium, which drives the webpage, and is able to parse it "post" javascript rendering.

TLDR Requests + XPATH can only be used when the page you're trying to parse contains the data you want in the HTML.

Can't parse phone numbers from a webpage using requests module

Question

1 answers

solution1
0 2021-06-16 13:23:12

Can't parse phone numbers from a webpage using requests module

Question

1 answers

solution1 0 2021-06-16 13:23:12

solution1
0 2021-06-16 13:23:12