簡體   English   中英

如何使用python + Selenium從網站保存數據

[英]How to save data from website by using python + selenium

我編寫了一個腳本,該腳本一個接一個地打開多個選項卡並從那里獲取數據。 現在,我可以從頁面中獲取數據,但是在寫入CSV文件時,可以按照以下方式獲取數據。

Bedrooms    Bathrooms   Super area  Floor   Status

3 See Dimensions    3 See Dimensions    2100    7 (Out of 23 Floors)    3 See Dimensions

Bedrooms    Bathrooms   Super area  Floor   Status

3 See Dimensions    3 See Dimensions    2100    7 (Out of 23 Floors)    3 See Dimensions

Bedrooms    Bathrooms   Super area  Floor   Status

1   1   520 4 (Out of 40 Floors)    1

Bedrooms    Bathrooms   Super area  Floor   Status

3 See Dimensions    3 See Dimensions    2100    7 (Out of 23 Floors)    3 See Dimensions

Bedrooms    Bathrooms   Super area  Floor   Status

1   1   520 4 (Out of 40 Floors)    1

Status列中,我得到了錯誤的值。

我努力了:

    # Go through of them and click on each.
        for unique_link in my_needed_links:
            unique_link.click()

            time.sleep(2)
            driver.switch_to_window(driver.window_handles[1])

            def get_elements_by_xpath(driver, xpath):
                return [entry.text for entry in driver.find_elements_by_xpath(xpath)]


            search_entries = [
            ("Bedrooms", "//div[@class='seeBedRoomDimen']"),
            ("Bathrooms", "//div[@class='p_value']"),
            ("Super area", "//span[@id='coveredAreaDisplay']"),
            ("Floor", "//div[@class='p_value truncated']"),
            ("Lift", "//div[@class='p_value']")]

            with open('textfile.csv', 'a+') as f_output:
                csv_output = csv.writer(f_output)
                # Write header
                csv_output.writerow([name for name, xpath in search_entries])
                entries = []
                for name, xpath in search_entries:
                    entries.append(get_elements_by_xpath(driver, xpath))
                csv_output.writerows(zip(*entries))

            get_elements_by_xpath(driver, xpath)

編輯

條目:作為列表

[['3 See Dimensions'], ['3 See Dimensions', '4', '3', '1', '2100 sqft', '1400 sqft', '33%', 'Avenue 54', 'Under Construction', "Dec, '20", 'New Property', '₹ 7.90 Cr ₹ 39,50,000 Approx. Registration Charges ₹ 15 Per sq. Unit Monthly\nSee Other Charges', "Santacruz West, Mumbai., Santacruz West, Mumbai - Western Suburbs, Maharashtra What's Nearby", "Next To St Teresa's Convent School & Sacred Heart School on SV Road.", 'East', 'P51800007149 (The project has been registered via MahaRERA registration number: P51800007149 and is available on the website https://maharera.mahaonline.gov.in under registered projects.)', 'Garden/Park, Pool, Main Road', 'Marble, Marbonite, Wooden', '1 Covered', '24 Hours Available', 'No/Rare Powercut', '6', '6', 'Unfurnished', 'Municipal Corporation of Greater Mumbai', 'Freehold', 'Brokers please do not contact', ''], ['2100'], ['7 (Out of 23 Floors)'], ['3 See Dimensions', '4', '3', '1', '2100 sqft', '1400 sqft', '33%', 'Avenue 54 1 Discussion on forum', 'Under Construction', "Dec, '20", 'New Property', '₹ 7.90 Cr ₹ 39,50,000 Approx. Registration Charges ₹ 15 Per sq. Unit Monthly\nSee Other Charges', "Santacruz West, Mumbai., Santacruz West, Mumbai - Western Suburbs, Maharashtra What's Nearby", "Next To St Teresa's Convent School & Sacred Heart School on SV Road.", 'East', 'P51800007149 (The project has been registered via MahaRERA registration number: P51800007149 and is available on the website https://maharera.mahaonline.gov.in under registered projects.)', 'Garden/Park, Pool, Main Road', 'Marble, Marbonite, Wooden', '1 Covered', '24 Hours Available', 'No/Rare Powercut', '6', '6', 'Unfurnished', 'Municipal Corporation of Greater Mumbai', 'Freehold', 'Brokers please do not contact', '']]
[['3 See Dimensions'], ['3 See Dimensions', '4', '3', '1', '2100 sqft', '1400 sqft', '33%', 'Avenue 54 1 Discussion on forum', 'Under Construction', "Dec, '20", 'New Property', '₹ 7.90 Cr ₹ 39,50,000 Approx. Registration Charges ₹ 15 Per sq. Unit Monthly\nSee Other Charges', "Santacruz West, Mumbai., Santacruz West, Mumbai - Western Suburbs, Maharashtra What's Nearby", "Next To St Teresa's Convent School & Sacred Heart School on SV Road.", 'East', 'P51800007149 (The project has been registered via MahaRERA registration number: P51800007149 and is available on the website https://maharera.mahaonline.gov.in under registered projects.)', 'Garden/Park, Pool, Main Road', 'Marble, Marbonite, Wooden', '1 Covered', '24 Hours Available', 'No/Rare Powercut', '6', '6', 'Unfurnished', 'Municipal Corporation of Greater Mumbai', 'Freehold', 'Brokers please do not contact', ''], ['2100'], ['7 (Out of 23 Floors)'], ['3 See Dimensions', '4', '3', '1', '2100 sqft', '1400 sqft', '33%', 'Avenue 54 1 Discussion on forum', 'Under Construction', "Dec, '20", 'New Property', '₹ 7.90 Cr ₹ 39,50,000 Approx. Registration Charges ₹ 15 Per sq. Unit Monthly\nSee Other Charges', "Santacruz West, Mumbai., Santacruz West, Mumbai - Western Suburbs, Maharashtra What's Nearby", "Next To St Teresa's Convent School & Sacred Heart School on SV Road.", 'East', 'P51800007149 (The project has been registered via MahaRERA registration number: P51800007149 and is available on the website https://maharera.mahaonline.gov.in under registered projects.)', 'Garden/Park, Pool, Main Road', 'Marble, Marbonite, Wooden', '1 Covered', '24 Hours Available', 'No/Rare Powercut', '6', '6', 'Unfurnished', 'Municipal Corporation of Greater Mumbai', 'Freehold', 'Brokers please do not contact', '']]

網站鏈接: https://www.magicbricks.com/propertyDetails/1-BHK-520-Sq-ft-Multistorey-Apartment-FOR-Sale-Kandivali-West-in-Mumbai&id=4d423333373433343431 : https://www.magicbricks.com/propertyDetails/1-BHK-520-Sq-ft-Multistorey-Apartment-FOR-Sale-Kandivali-West-in-Mumbai&id=4d423333373433343431

編輯1

my_needed_links = []

list_links = driver.find_elements_by_tag_name("a")

for i in range(0, 2):
    # Get unique links.
    for link in list_links:
        if "https://www.magicbricks.com/propertyDetails/" in link.get_attribute("href"):
            if link not in my_needed_links:
                my_needed_links.append(link)

    # Go through of them and click on each.
        for unique_link in my_needed_links:
            unique_link.click()

            time.sleep(2)
            driver.switch_to_window(driver.window_handles[1])

            def get_elements_by_xpath(driver, xpath):
                return [entry.text for entry in driver.find_elements_by_xpath(xpath)]

            search_entries = [
            ("Bedrooms", "//div[@class='seeBedRoomDimen']"),
            ("Bathrooms", "//div[@class='p_value']"),
            ("Super area", "//span[@id='coveredAreaDisplay']"),
            ("Floor", "//div[@class='p_value truncated']"),
            ("Lift", "//div[@class='p_value']")]

            #with open('textfile.csv', 'a+') as f_output:
            entries = []
            for name, xpath in search_entries:
                entries.append(get_elements_by_xpath(driver, xpath))
                data = [entry for entry in entries if len(entry)==28]
                df = pd.DataFrame(data)
                print (df)
            df.to_csv('nameoffile.csv', mode='a',index=False,encoding='utf-8')
            #df.to_csv('nameoffile.csv',mode='a', index=False,encoding='utf-8')

            get_elements_by_xpath(driver, xpath)
            time.sleep(2)

            driver.close()
            # Switch back to the main tab/window.
            driver.switch_to_window(driver.window_handles[0])     

先感謝您。 請提出一些建議

由於位置原因,我無法加載頁面。 但是從您的輸入中,您可以執行以下操作:

 #Your selenium imports
import pandas as pd

def get_elements_by_xpath(driver, xpath):
    return [entry.text for entry in driver.find_elements_by_xpath(xpath)]


for unique_link in my_needed_links:
    unique_link.click()
    time.sleep(2)
    driver.switch_to_window(driver.window_handles[1])
    search_entries = [
        ("Bedrooms", "//div[@class='seeBedRoomDimen']"), ("Bathrooms", "//div[@class='p_value']"),("Super area", "//span[@id='coveredAreaDisplay']"),("Floor", "//div[@class='p_value truncated']"),("Lift", "//div[@class='p_value']")]

    entries = []
    for name, xpath in search_entries:
        entries.append(get_elements_by_xpath(driver, xpath))

    data = [entry for entry in entries if len(entry)>5]

    df = pd.DataFrame(data)

    df.drop_duplicates(inplace=True)

    df.to_csv('nameoffile.csv', sep=';',index=False,encoding='utf-8',mode='a')

    get_elements_by_xpath(driver, xpath)

浴室和電梯的xpath相同,因此在這些列中得到的結果相同。 嘗試尋找另一種方法來識別和區分它們。 您可能可以使用索引,盡管如果有另一種方法通常更喜歡它。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM