简体   繁体   中英

Selenium page content mismatch with Google Chrome devtools

I want to scrape data from the page: https://broward.county-taxes.com/public/real_estate/parcels/494101-09-1060/bills

There is a table and I need:

  1. get first row on the table
  2. click on this item
  3. scrape Ad Valorem Taxes section ('TAXING AUTHORITY' and 'MILLAGE') data

And I have implemented a script using python selenium and it works locally (linux mint 19). When I deployed to the server and run the same on the server side (ubuntu) it does not work

The problem is that when I am loading 'bills' page it does not load any table and table data at all. I have printed out driver.page_source and table is missing.

Any advise would be appreciated.

Below the function source code.

Weird that locally it works, but on the server side not!

    def download_tax_bill_form(self, formatted_apn):
    """
    Scrape county tax bill table by specified 'apn'
    :param formatted_apn: The case apn formatted

    Returns:
        - scraped dictionary object

        {
            ad_valorem_taxes: [
                {
                    'group_name': 'BROWARD COUNTY GOVERNMENT',
                    'items': [
                        {
                            'name': 'COUNTYWIDE SERVICES',
                            'millage': 5.49990
                        },
                        ...
                    ]
                },
                ...
            ]
        }
    """
    if self.driver is None:
        # create chrome driver
        self.driver = self.create_driver()

    # go to bills page by 'apn'
    self.driver.get(f'https://broward.county-taxes.com/public/real_estate/parcels/{formatted_apn}/bills')
    self.driver.implicitly_wait(5)

    # click on the first table row (last year bill)
    WebDriverWait(self.driver, 20).until(
        EC.element_to_be_clickable(
            (
                By.XPATH,
                "(//table[@class='table table-hover bills']/tbody)[1]/tr/th/a[1]"
            )
        )
    ).click()

    # get table items from requested table
    ad_valorem_taxes_items = self.driver.find_elements_by_xpath("//div[@class='row advalorem']/div/table/tbody")

    groups = []
    group = {}

    # transform table results to dict
    for item in ad_valorem_taxes_items:
        class_name = item.get_attribute("class")

        if class_name == 'district-group':
            if group:
                groups.append(group)

            group = {}

            group_name = item.find_element_by_xpath('.//tr/th').text
            group["group_name"] = group_name
            group["items"] = []

        elif class_name == 'taxing-authority':
            name = item.find_element_by_xpath(".//tr/th[@class='name']").text
            try:
                millage = float(item.find_element_by_xpath(".//tr/td[@class='millage']").text)
            except ValueError:
                millage = None

            group_item = {
                "name": name,
                "millage": millage
            }
            group["items"].append(group_item)

    # add last group
    groups.append(group)
    return {"ad_valorem_taxes": groups}

I am guessing that the user agent request header is different thus the page is showing different contents, you can try setting the user agent header inside your script similar to the value on your working machine

for example:

from selenium.webdriver.chrome.options import Options
opts = Options()
opts.add_argument('user-agent="Your user Agent Goes here!!"')
driver = webdriver.Chrome(chrome_options=opts)

You can find your user agent by googling "my user agent" Or by inspecting the request in your browser ("The headers tab, Request headers, User-Agent")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM