简体   繁体   中英

How to save data from website by using python + selenium

I have written a script which is opening multiple tabs one by one and taking data from there. Now I am able to get data from the page but when writing in CSV file getting data as per below.

Bedrooms    Bathrooms   Super area  Floor   Status

3 See Dimensions    3 See Dimensions    2100    7 (Out of 23 Floors)    3 See Dimensions

Bedrooms    Bathrooms   Super area  Floor   Status

3 See Dimensions    3 See Dimensions    2100    7 (Out of 23 Floors)    3 See Dimensions

Bedrooms    Bathrooms   Super area  Floor   Status

1   1   520 4 (Out of 40 Floors)    1

Bedrooms    Bathrooms   Super area  Floor   Status

3 See Dimensions    3 See Dimensions    2100    7 (Out of 23 Floors)    3 See Dimensions

Bedrooms    Bathrooms   Super area  Floor   Status

1   1   520 4 (Out of 40 Floors)    1

In the Status column i am getting wrong value.

I have tried:

    # Go through of them and click on each.
        for unique_link in my_needed_links:
            unique_link.click()

            time.sleep(2)
            driver.switch_to_window(driver.window_handles[1])

            def get_elements_by_xpath(driver, xpath):
                return [entry.text for entry in driver.find_elements_by_xpath(xpath)]


            search_entries = [
            ("Bedrooms", "//div[@class='seeBedRoomDimen']"),
            ("Bathrooms", "//div[@class='p_value']"),
            ("Super area", "//span[@id='coveredAreaDisplay']"),
            ("Floor", "//div[@class='p_value truncated']"),
            ("Lift", "//div[@class='p_value']")]

            with open('textfile.csv', 'a+') as f_output:
                csv_output = csv.writer(f_output)
                # Write header
                csv_output.writerow([name for name, xpath in search_entries])
                entries = []
                for name, xpath in search_entries:
                    entries.append(get_elements_by_xpath(driver, xpath))
                csv_output.writerows(zip(*entries))

            get_elements_by_xpath(driver, xpath)

Edit

Entries: as list

[['3 See Dimensions'], ['3 See Dimensions', '4', '3', '1', '2100 sqft', '1400 sqft', '33%', 'Avenue 54', 'Under Construction', "Dec, '20", 'New Property', '₹ 7.90 Cr ₹ 39,50,000 Approx. Registration Charges ₹ 15 Per sq. Unit Monthly\nSee Other Charges', "Santacruz West, Mumbai., Santacruz West, Mumbai - Western Suburbs, Maharashtra What's Nearby", "Next To St Teresa's Convent School & Sacred Heart School on SV Road.", 'East', 'P51800007149 (The project has been registered via MahaRERA registration number: P51800007149 and is available on the website https://maharera.mahaonline.gov.in under registered projects.)', 'Garden/Park, Pool, Main Road', 'Marble, Marbonite, Wooden', '1 Covered', '24 Hours Available', 'No/Rare Powercut', '6', '6', 'Unfurnished', 'Municipal Corporation of Greater Mumbai', 'Freehold', 'Brokers please do not contact', ''], ['2100'], ['7 (Out of 23 Floors)'], ['3 See Dimensions', '4', '3', '1', '2100 sqft', '1400 sqft', '33%', 'Avenue 54 1 Discussion on forum', 'Under Construction', "Dec, '20", 'New Property', '₹ 7.90 Cr ₹ 39,50,000 Approx. Registration Charges ₹ 15 Per sq. Unit Monthly\nSee Other Charges', "Santacruz West, Mumbai., Santacruz West, Mumbai - Western Suburbs, Maharashtra What's Nearby", "Next To St Teresa's Convent School & Sacred Heart School on SV Road.", 'East', 'P51800007149 (The project has been registered via MahaRERA registration number: P51800007149 and is available on the website https://maharera.mahaonline.gov.in under registered projects.)', 'Garden/Park, Pool, Main Road', 'Marble, Marbonite, Wooden', '1 Covered', '24 Hours Available', 'No/Rare Powercut', '6', '6', 'Unfurnished', 'Municipal Corporation of Greater Mumbai', 'Freehold', 'Brokers please do not contact', '']]
[['3 See Dimensions'], ['3 See Dimensions', '4', '3', '1', '2100 sqft', '1400 sqft', '33%', 'Avenue 54 1 Discussion on forum', 'Under Construction', "Dec, '20", 'New Property', '₹ 7.90 Cr ₹ 39,50,000 Approx. Registration Charges ₹ 15 Per sq. Unit Monthly\nSee Other Charges', "Santacruz West, Mumbai., Santacruz West, Mumbai - Western Suburbs, Maharashtra What's Nearby", "Next To St Teresa's Convent School & Sacred Heart School on SV Road.", 'East', 'P51800007149 (The project has been registered via MahaRERA registration number: P51800007149 and is available on the website https://maharera.mahaonline.gov.in under registered projects.)', 'Garden/Park, Pool, Main Road', 'Marble, Marbonite, Wooden', '1 Covered', '24 Hours Available', 'No/Rare Powercut', '6', '6', 'Unfurnished', 'Municipal Corporation of Greater Mumbai', 'Freehold', 'Brokers please do not contact', ''], ['2100'], ['7 (Out of 23 Floors)'], ['3 See Dimensions', '4', '3', '1', '2100 sqft', '1400 sqft', '33%', 'Avenue 54 1 Discussion on forum', 'Under Construction', "Dec, '20", 'New Property', '₹ 7.90 Cr ₹ 39,50,000 Approx. Registration Charges ₹ 15 Per sq. Unit Monthly\nSee Other Charges', "Santacruz West, Mumbai., Santacruz West, Mumbai - Western Suburbs, Maharashtra What's Nearby", "Next To St Teresa's Convent School & Sacred Heart School on SV Road.", 'East', 'P51800007149 (The project has been registered via MahaRERA registration number: P51800007149 and is available on the website https://maharera.mahaonline.gov.in under registered projects.)', 'Garden/Park, Pool, Main Road', 'Marble, Marbonite, Wooden', '1 Covered', '24 Hours Available', 'No/Rare Powercut', '6', '6', 'Unfurnished', 'Municipal Corporation of Greater Mumbai', 'Freehold', 'Brokers please do not contact', '']]

website link: https://www.magicbricks.com/propertyDetails/1-BHK-520-Sq-ft-Multistorey-Apartment-FOR-Sale-Kandivali-West-in-Mumbai&id=4d423333373433343431

Edit 1

my_needed_links = []

list_links = driver.find_elements_by_tag_name("a")

for i in range(0, 2):
    # Get unique links.
    for link in list_links:
        if "https://www.magicbricks.com/propertyDetails/" in link.get_attribute("href"):
            if link not in my_needed_links:
                my_needed_links.append(link)

    # Go through of them and click on each.
        for unique_link in my_needed_links:
            unique_link.click()

            time.sleep(2)
            driver.switch_to_window(driver.window_handles[1])

            def get_elements_by_xpath(driver, xpath):
                return [entry.text for entry in driver.find_elements_by_xpath(xpath)]

            search_entries = [
            ("Bedrooms", "//div[@class='seeBedRoomDimen']"),
            ("Bathrooms", "//div[@class='p_value']"),
            ("Super area", "//span[@id='coveredAreaDisplay']"),
            ("Floor", "//div[@class='p_value truncated']"),
            ("Lift", "//div[@class='p_value']")]

            #with open('textfile.csv', 'a+') as f_output:
            entries = []
            for name, xpath in search_entries:
                entries.append(get_elements_by_xpath(driver, xpath))
                data = [entry for entry in entries if len(entry)==28]
                df = pd.DataFrame(data)
                print (df)
            df.to_csv('nameoffile.csv', mode='a',index=False,encoding='utf-8')
            #df.to_csv('nameoffile.csv',mode='a', index=False,encoding='utf-8')

            get_elements_by_xpath(driver, xpath)
            time.sleep(2)

            driver.close()
            # Switch back to the main tab/window.
            driver.switch_to_window(driver.window_handles[0])     

Thank you in advance. Please suggest something

I could not load the page due to my location. But from your entries, you could do:

 #Your selenium imports
import pandas as pd

def get_elements_by_xpath(driver, xpath):
    return [entry.text for entry in driver.find_elements_by_xpath(xpath)]


for unique_link in my_needed_links:
    unique_link.click()
    time.sleep(2)
    driver.switch_to_window(driver.window_handles[1])
    search_entries = [
        ("Bedrooms", "//div[@class='seeBedRoomDimen']"), ("Bathrooms", "//div[@class='p_value']"),("Super area", "//span[@id='coveredAreaDisplay']"),("Floor", "//div[@class='p_value truncated']"),("Lift", "//div[@class='p_value']")]

    entries = []
    for name, xpath in search_entries:
        entries.append(get_elements_by_xpath(driver, xpath))

    data = [entry for entry in entries if len(entry)>5]

    df = pd.DataFrame(data)

    df.drop_duplicates(inplace=True)

    df.to_csv('nameoffile.csv', sep=';',index=False,encoding='utf-8',mode='a')

    get_elements_by_xpath(driver, xpath)

The xpath for bathrooms and for lift are the same, therefore you get the same results in these columns. Try to find another way to identify and distinguish between them. You can probably use an index, though if there's another way it's usually preferred.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM