简体   繁体   English

如何使用python + Selenium从网站保存数据

[英]How to save data from website by using python + selenium

I have written a script which is opening multiple tabs one by one and taking data from there. 我编写了一个脚本,该脚本一个接一个地打开多个选项卡并从那里获取数据。 Now I am able to get data from the page but when writing in CSV file getting data as per below. 现在,我可以从页面中获取数据,但是在写入CSV文件时,可以按照以下方式获取数据。

Bedrooms    Bathrooms   Super area  Floor   Status

3 See Dimensions    3 See Dimensions    2100    7 (Out of 23 Floors)    3 See Dimensions

Bedrooms    Bathrooms   Super area  Floor   Status

3 See Dimensions    3 See Dimensions    2100    7 (Out of 23 Floors)    3 See Dimensions

Bedrooms    Bathrooms   Super area  Floor   Status

1   1   520 4 (Out of 40 Floors)    1

Bedrooms    Bathrooms   Super area  Floor   Status

3 See Dimensions    3 See Dimensions    2100    7 (Out of 23 Floors)    3 See Dimensions

Bedrooms    Bathrooms   Super area  Floor   Status

1   1   520 4 (Out of 40 Floors)    1

In the Status column i am getting wrong value. Status列中,我得到了错误的值。

I have tried: 我努力了:

    # Go through of them and click on each.
        for unique_link in my_needed_links:
            unique_link.click()

            time.sleep(2)
            driver.switch_to_window(driver.window_handles[1])

            def get_elements_by_xpath(driver, xpath):
                return [entry.text for entry in driver.find_elements_by_xpath(xpath)]


            search_entries = [
            ("Bedrooms", "//div[@class='seeBedRoomDimen']"),
            ("Bathrooms", "//div[@class='p_value']"),
            ("Super area", "//span[@id='coveredAreaDisplay']"),
            ("Floor", "//div[@class='p_value truncated']"),
            ("Lift", "//div[@class='p_value']")]

            with open('textfile.csv', 'a+') as f_output:
                csv_output = csv.writer(f_output)
                # Write header
                csv_output.writerow([name for name, xpath in search_entries])
                entries = []
                for name, xpath in search_entries:
                    entries.append(get_elements_by_xpath(driver, xpath))
                csv_output.writerows(zip(*entries))

            get_elements_by_xpath(driver, xpath)

Edit 编辑

Entries: as list 条目:作为列表

[['3 See Dimensions'], ['3 See Dimensions', '4', '3', '1', '2100 sqft', '1400 sqft', '33%', 'Avenue 54', 'Under Construction', "Dec, '20", 'New Property', '₹ 7.90 Cr ₹ 39,50,000 Approx. Registration Charges ₹ 15 Per sq. Unit Monthly\nSee Other Charges', "Santacruz West, Mumbai., Santacruz West, Mumbai - Western Suburbs, Maharashtra What's Nearby", "Next To St Teresa's Convent School & Sacred Heart School on SV Road.", 'East', 'P51800007149 (The project has been registered via MahaRERA registration number: P51800007149 and is available on the website https://maharera.mahaonline.gov.in under registered projects.)', 'Garden/Park, Pool, Main Road', 'Marble, Marbonite, Wooden', '1 Covered', '24 Hours Available', 'No/Rare Powercut', '6', '6', 'Unfurnished', 'Municipal Corporation of Greater Mumbai', 'Freehold', 'Brokers please do not contact', ''], ['2100'], ['7 (Out of 23 Floors)'], ['3 See Dimensions', '4', '3', '1', '2100 sqft', '1400 sqft', '33%', 'Avenue 54 1 Discussion on forum', 'Under Construction', "Dec, '20", 'New Property', '₹ 7.90 Cr ₹ 39,50,000 Approx. Registration Charges ₹ 15 Per sq. Unit Monthly\nSee Other Charges', "Santacruz West, Mumbai., Santacruz West, Mumbai - Western Suburbs, Maharashtra What's Nearby", "Next To St Teresa's Convent School & Sacred Heart School on SV Road.", 'East', 'P51800007149 (The project has been registered via MahaRERA registration number: P51800007149 and is available on the website https://maharera.mahaonline.gov.in under registered projects.)', 'Garden/Park, Pool, Main Road', 'Marble, Marbonite, Wooden', '1 Covered', '24 Hours Available', 'No/Rare Powercut', '6', '6', 'Unfurnished', 'Municipal Corporation of Greater Mumbai', 'Freehold', 'Brokers please do not contact', '']]
[['3 See Dimensions'], ['3 See Dimensions', '4', '3', '1', '2100 sqft', '1400 sqft', '33%', 'Avenue 54 1 Discussion on forum', 'Under Construction', "Dec, '20", 'New Property', '₹ 7.90 Cr ₹ 39,50,000 Approx. Registration Charges ₹ 15 Per sq. Unit Monthly\nSee Other Charges', "Santacruz West, Mumbai., Santacruz West, Mumbai - Western Suburbs, Maharashtra What's Nearby", "Next To St Teresa's Convent School & Sacred Heart School on SV Road.", 'East', 'P51800007149 (The project has been registered via MahaRERA registration number: P51800007149 and is available on the website https://maharera.mahaonline.gov.in under registered projects.)', 'Garden/Park, Pool, Main Road', 'Marble, Marbonite, Wooden', '1 Covered', '24 Hours Available', 'No/Rare Powercut', '6', '6', 'Unfurnished', 'Municipal Corporation of Greater Mumbai', 'Freehold', 'Brokers please do not contact', ''], ['2100'], ['7 (Out of 23 Floors)'], ['3 See Dimensions', '4', '3', '1', '2100 sqft', '1400 sqft', '33%', 'Avenue 54 1 Discussion on forum', 'Under Construction', "Dec, '20", 'New Property', '₹ 7.90 Cr ₹ 39,50,000 Approx. Registration Charges ₹ 15 Per sq. Unit Monthly\nSee Other Charges', "Santacruz West, Mumbai., Santacruz West, Mumbai - Western Suburbs, Maharashtra What's Nearby", "Next To St Teresa's Convent School & Sacred Heart School on SV Road.", 'East', 'P51800007149 (The project has been registered via MahaRERA registration number: P51800007149 and is available on the website https://maharera.mahaonline.gov.in under registered projects.)', 'Garden/Park, Pool, Main Road', 'Marble, Marbonite, Wooden', '1 Covered', '24 Hours Available', 'No/Rare Powercut', '6', '6', 'Unfurnished', 'Municipal Corporation of Greater Mumbai', 'Freehold', 'Brokers please do not contact', '']]

website link: https://www.magicbricks.com/propertyDetails/1-BHK-520-Sq-ft-Multistorey-Apartment-FOR-Sale-Kandivali-West-in-Mumbai&id=4d423333373433343431 网站链接: https://www.magicbricks.com/propertyDetails/1-BHK-520-Sq-ft-Multistorey-Apartment-FOR-Sale-Kandivali-West-in-Mumbai&id=4d423333373433343431 : https://www.magicbricks.com/propertyDetails/1-BHK-520-Sq-ft-Multistorey-Apartment-FOR-Sale-Kandivali-West-in-Mumbai&id=4d423333373433343431

Edit 1 编辑1

my_needed_links = []

list_links = driver.find_elements_by_tag_name("a")

for i in range(0, 2):
    # Get unique links.
    for link in list_links:
        if "https://www.magicbricks.com/propertyDetails/" in link.get_attribute("href"):
            if link not in my_needed_links:
                my_needed_links.append(link)

    # Go through of them and click on each.
        for unique_link in my_needed_links:
            unique_link.click()

            time.sleep(2)
            driver.switch_to_window(driver.window_handles[1])

            def get_elements_by_xpath(driver, xpath):
                return [entry.text for entry in driver.find_elements_by_xpath(xpath)]

            search_entries = [
            ("Bedrooms", "//div[@class='seeBedRoomDimen']"),
            ("Bathrooms", "//div[@class='p_value']"),
            ("Super area", "//span[@id='coveredAreaDisplay']"),
            ("Floor", "//div[@class='p_value truncated']"),
            ("Lift", "//div[@class='p_value']")]

            #with open('textfile.csv', 'a+') as f_output:
            entries = []
            for name, xpath in search_entries:
                entries.append(get_elements_by_xpath(driver, xpath))
                data = [entry for entry in entries if len(entry)==28]
                df = pd.DataFrame(data)
                print (df)
            df.to_csv('nameoffile.csv', mode='a',index=False,encoding='utf-8')
            #df.to_csv('nameoffile.csv',mode='a', index=False,encoding='utf-8')

            get_elements_by_xpath(driver, xpath)
            time.sleep(2)

            driver.close()
            # Switch back to the main tab/window.
            driver.switch_to_window(driver.window_handles[0])     

Thank you in advance. 先感谢您。 Please suggest something 请提出一些建议

I could not load the page due to my location. 由于位置原因,我无法加载页面。 But from your entries, you could do: 但是从您的输入中,您可以执行以下操作:

 #Your selenium imports
import pandas as pd

def get_elements_by_xpath(driver, xpath):
    return [entry.text for entry in driver.find_elements_by_xpath(xpath)]


for unique_link in my_needed_links:
    unique_link.click()
    time.sleep(2)
    driver.switch_to_window(driver.window_handles[1])
    search_entries = [
        ("Bedrooms", "//div[@class='seeBedRoomDimen']"), ("Bathrooms", "//div[@class='p_value']"),("Super area", "//span[@id='coveredAreaDisplay']"),("Floor", "//div[@class='p_value truncated']"),("Lift", "//div[@class='p_value']")]

    entries = []
    for name, xpath in search_entries:
        entries.append(get_elements_by_xpath(driver, xpath))

    data = [entry for entry in entries if len(entry)>5]

    df = pd.DataFrame(data)

    df.drop_duplicates(inplace=True)

    df.to_csv('nameoffile.csv', sep=';',index=False,encoding='utf-8',mode='a')

    get_elements_by_xpath(driver, xpath)

The xpath for bathrooms and for lift are the same, therefore you get the same results in these columns. 浴室和电梯的xpath相同,因此在这些列中得到的结果相同。 Try to find another way to identify and distinguish between them. 尝试寻找另一种方法来识别和区分它们。 You can probably use an index, though if there's another way it's usually preferred. 您可能可以使用索引,尽管如果有另一种方法通常更喜欢它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用python和selenium将数据从列表输入网站? - How to type data from list to a website using python and selenium? 如何使用python selenium从网站获取数据 - How to get data from a website using python selenium 如何在 python 中使用 selenium 从网站上抓取多个图像并将它们保存在特定文件夹中? - How do I scrape multiple images from a website and save them on a specific folder using selenium in python? 如何使用 selenium 从网站抓取数据 - How to scrape data from website using selenium 在python中使用硒从动态网站获取数据:如何发现数据库查询的完成方式? - Using selenium in python to get data from dynamic website: how to discover the way databases querys are done? 如何使用硒python一步一步单击以从网站获取数据 - How to click one by one to get data from website by using selenium python 如何在python中使用硒和beautifulsoup从网站上抓报纸? - How to scrape newspaper articles from website using selenium and beautifulsoup in python? 如何使用 Python + Selenium 从网站下载特定文件 - How to download specific files from a website using Python + Selenium 如何在 Python 3 中使用 Selenium 从网站的某个部分获取文本 - How to get text from a section of a website using Selenium in Python 3 如何使用 Selenium Python 从网站提取图书价格 - How to extract the price of the book from website using Selenium Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM