如何使用python + Selenium从网站保存数据

Question

I have written a script which is opening multiple tabs one by one and taking data from there. 我编写了一个脚本，该脚本一个接一个地打开多个选项卡并从那里获取数据。 Now I am able to get data from the page but when writing in CSV file getting data as per below. 现在，我可以从页面中获取数据，但是在写入CSV文件时，可以按照以下方式获取数据。

Bedrooms    Bathrooms   Super area  Floor   Status

3 See Dimensions    3 See Dimensions    2100    7 (Out of 23 Floors)    3 See Dimensions

Bedrooms    Bathrooms   Super area  Floor   Status

3 See Dimensions    3 See Dimensions    2100    7 (Out of 23 Floors)    3 See Dimensions

Bedrooms    Bathrooms   Super area  Floor   Status

1   1   520 4 (Out of 40 Floors)    1

Bedrooms    Bathrooms   Super area  Floor   Status

3 See Dimensions    3 See Dimensions    2100    7 (Out of 23 Floors)    3 See Dimensions

Bedrooms    Bathrooms   Super area  Floor   Status

1   1   520 4 (Out of 40 Floors)    1

In the Status column i am getting wrong value. 在Status列中，我得到了错误的值。

I have tried: 我努力了：

    # Go through of them and click on each.
        for unique_link in my_needed_links:
            unique_link.click()

            time.sleep(2)
            driver.switch_to_window(driver.window_handles[1])

            def get_elements_by_xpath(driver, xpath):
                return [entry.text for entry in driver.find_elements_by_xpath(xpath)]


            search_entries = [
            ("Bedrooms", "//div[@class='seeBedRoomDimen']"),
            ("Bathrooms", "//div[@class='p_value']"),
            ("Super area", "//span[@id='coveredAreaDisplay']"),
            ("Floor", "//div[@class='p_value truncated']"),
            ("Lift", "//div[@class='p_value']")]

            with open('textfile.csv', 'a+') as f_output:
                csv_output = csv.writer(f_output)
                # Write header
                csv_output.writerow([name for name, xpath in search_entries])
                entries = []
                for name, xpath in search_entries:
                    entries.append(get_elements_by_xpath(driver, xpath))
                csv_output.writerows(zip(*entries))

            get_elements_by_xpath(driver, xpath)

Edit 编辑

Entries: as list 条目：作为列表

[['3 See Dimensions'], ['3 See Dimensions', '4', '3', '1', '2100 sqft', '1400 sqft', '33%', 'Avenue 54', 'Under Construction', "Dec, '20", 'New Property', '₹ 7.90 Cr ₹ 39,50,000 Approx. Registration Charges ₹ 15 Per sq. Unit Monthly\nSee Other Charges', "Santacruz West, Mumbai., Santacruz West, Mumbai - Western Suburbs, Maharashtra What's Nearby", "Next To St Teresa's Convent School & Sacred Heart School on SV Road.", 'East', 'P51800007149 (The project has been registered via MahaRERA registration number: P51800007149 and is available on the website https://maharera.mahaonline.gov.in under registered projects.)', 'Garden/Park, Pool, Main Road', 'Marble, Marbonite, Wooden', '1 Covered', '24 Hours Available', 'No/Rare Powercut', '6', '6', 'Unfurnished', 'Municipal Corporation of Greater Mumbai', 'Freehold', 'Brokers please do not contact', ''], ['2100'], ['7 (Out of 23 Floors)'], ['3 See Dimensions', '4', '3', '1', '2100 sqft', '1400 sqft', '33%', 'Avenue 54 1 Discussion on forum', 'Under Construction', "Dec, '20", 'New Property', '₹ 7.90 Cr ₹ 39,50,000 Approx. Registration Charges ₹ 15 Per sq. Unit Monthly\nSee Other Charges', "Santacruz West, Mumbai., Santacruz West, Mumbai - Western Suburbs, Maharashtra What's Nearby", "Next To St Teresa's Convent School & Sacred Heart School on SV Road.", 'East', 'P51800007149 (The project has been registered via MahaRERA registration number: P51800007149 and is available on the website https://maharera.mahaonline.gov.in under registered projects.)', 'Garden/Park, Pool, Main Road', 'Marble, Marbonite, Wooden', '1 Covered', '24 Hours Available', 'No/Rare Powercut', '6', '6', 'Unfurnished', 'Municipal Corporation of Greater Mumbai', 'Freehold', 'Brokers please do not contact', '']]
[['3 See Dimensions'], ['3 See Dimensions', '4', '3', '1', '2100 sqft', '1400 sqft', '33%', 'Avenue 54 1 Discussion on forum', 'Under Construction', "Dec, '20", 'New Property', '₹ 7.90 Cr ₹ 39,50,000 Approx. Registration Charges ₹ 15 Per sq. Unit Monthly\nSee Other Charges', "Santacruz West, Mumbai., Santacruz West, Mumbai - Western Suburbs, Maharashtra What's Nearby", "Next To St Teresa's Convent School & Sacred Heart School on SV Road.", 'East', 'P51800007149 (The project has been registered via MahaRERA registration number: P51800007149 and is available on the website https://maharera.mahaonline.gov.in under registered projects.)', 'Garden/Park, Pool, Main Road', 'Marble, Marbonite, Wooden', '1 Covered', '24 Hours Available', 'No/Rare Powercut', '6', '6', 'Unfurnished', 'Municipal Corporation of Greater Mumbai', 'Freehold', 'Brokers please do not contact', ''], ['2100'], ['7 (Out of 23 Floors)'], ['3 See Dimensions', '4', '3', '1', '2100 sqft', '1400 sqft', '33%', 'Avenue 54 1 Discussion on forum', 'Under Construction', "Dec, '20", 'New Property', '₹ 7.90 Cr ₹ 39,50,000 Approx. Registration Charges ₹ 15 Per sq. Unit Monthly\nSee Other Charges', "Santacruz West, Mumbai., Santacruz West, Mumbai - Western Suburbs, Maharashtra What's Nearby", "Next To St Teresa's Convent School & Sacred Heart School on SV Road.", 'East', 'P51800007149 (The project has been registered via MahaRERA registration number: P51800007149 and is available on the website https://maharera.mahaonline.gov.in under registered projects.)', 'Garden/Park, Pool, Main Road', 'Marble, Marbonite, Wooden', '1 Covered', '24 Hours Available', 'No/Rare Powercut', '6', '6', 'Unfurnished', 'Municipal Corporation of Greater Mumbai', 'Freehold', 'Brokers please do not contact', '']]

website link: https://www.magicbricks.com/propertyDetails/1-BHK-520-Sq-ft-Multistorey-Apartment-FOR-Sale-Kandivali-West-in-Mumbai&id=4d423333373433343431 网站链接： https://www.magicbricks.com/propertyDetails/1-BHK-520-Sq-ft-Multistorey-Apartment-FOR-Sale-Kandivali-West-in-Mumbai&id=4d423333373433343431 : https://www.magicbricks.com/propertyDetails/1-BHK-520-Sq-ft-Multistorey-Apartment-FOR-Sale-Kandivali-West-in-Mumbai&id=4d423333373433343431

Edit 1 编辑1

my_needed_links = []

list_links = driver.find_elements_by_tag_name("a")

for i in range(0, 2):
    # Get unique links.
    for link in list_links:
        if "https://www.magicbricks.com/propertyDetails/" in link.get_attribute("href"):
            if link not in my_needed_links:
                my_needed_links.append(link)

    # Go through of them and click on each.
        for unique_link in my_needed_links:
            unique_link.click()

            time.sleep(2)
            driver.switch_to_window(driver.window_handles[1])

            def get_elements_by_xpath(driver, xpath):
                return [entry.text for entry in driver.find_elements_by_xpath(xpath)]

            search_entries = [
            ("Bedrooms", "//div[@class='seeBedRoomDimen']"),
            ("Bathrooms", "//div[@class='p_value']"),
            ("Super area", "//span[@id='coveredAreaDisplay']"),
            ("Floor", "//div[@class='p_value truncated']"),
            ("Lift", "//div[@class='p_value']")]

            #with open('textfile.csv', 'a+') as f_output:
            entries = []
            for name, xpath in search_entries:
                entries.append(get_elements_by_xpath(driver, xpath))
                data = [entry for entry in entries if len(entry)==28]
                df = pd.DataFrame(data)
                print (df)
            df.to_csv('nameoffile.csv', mode='a',index=False,encoding='utf-8')
            #df.to_csv('nameoffile.csv',mode='a', index=False,encoding='utf-8')

            get_elements_by_xpath(driver, xpath)
            time.sleep(2)

            driver.close()
            # Switch back to the main tab/window.
            driver.switch_to_window(driver.window_handles[0])

Thank you in advance. 先感谢您。 Please suggest something 请提出一些建议

Answer 1

I could not load the page due to my location. 由于位置原因，我无法加载页面。 But from your entries, you could do: 但是从您的输入中，您可以执行以下操作：

 #Your selenium imports
import pandas as pd

def get_elements_by_xpath(driver, xpath):
    return [entry.text for entry in driver.find_elements_by_xpath(xpath)]


for unique_link in my_needed_links:
    unique_link.click()
    time.sleep(2)
    driver.switch_to_window(driver.window_handles[1])
    search_entries = [
        ("Bedrooms", "//div[@class='seeBedRoomDimen']"), ("Bathrooms", "//div[@class='p_value']"),("Super area", "//span[@id='coveredAreaDisplay']"),("Floor", "//div[@class='p_value truncated']"),("Lift", "//div[@class='p_value']")]

    entries = []
    for name, xpath in search_entries:
        entries.append(get_elements_by_xpath(driver, xpath))

    data = [entry for entry in entries if len(entry)>5]

    df = pd.DataFrame(data)

    df.drop_duplicates(inplace=True)

    df.to_csv('nameoffile.csv', sep=';',index=False,encoding='utf-8',mode='a')

    get_elements_by_xpath(driver, xpath)

Answer 2

The xpath for bathrooms and for lift are the same, therefore you get the same results in these columns. 浴室和电梯的xpath相同，因此在这些列中得到的结果相同。 Try to find another way to identify and distinguish between them. 尝试寻找另一种方法来识别和区分它们。 You can probably use an index, though if there's another way it's usually preferred. 您可能可以使用索引，尽管如果有另一种方法通常更喜欢它。

如何使用python + Selenium从网站保存数据

问题描述

2 个解决方案

解决方案1
0 2018-09-01 21:00:40

解决方案2
0 2018-09-02 05:34:58

如何使用python + Selenium从网站保存数据

问题描述

2 个解决方案

解决方案1 0 2018-09-01 21:00:40

解决方案2 0 2018-09-02 05:34:58

解决方案1
0 2018-09-01 21:00:40

解决方案2
0 2018-09-02 05:34:58