简体   繁体   English

使用Selenium提取数据并导出到csv文件

[英]Fetching data with Selenium and exporting to a csv file

I'm trying to scrape all data from this page: https://icostats.com . 我正在尝试从以下页面抓取所有数据: https : //icostats.com I need to export the content of each row to a row in a csv file. 我需要将每行的内容导出到csv文件中的一行。 Also, I think there's got to be a prettier and more functional way of iterating though each row. 另外,我认为必须有一种更漂亮,更实用的方法来遍历每一行。 So far, here's where I'm at: 到目前为止,这是我的位置:

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.support.ui import WebDriverWait as wait

    def get_css_sel(selector):
        posts = browser.find_elements_by_css_selector(selector)
        for post in posts:
            print(post.text)

    browser = webdriver.Chrome(executable_path=r'C:\Scrapers\chromedriver.exe')
    browser.get("https://icostats.com")
    wait(browser, 20).until(EC.presence_of_element_located((By.CSS_SELECTOR, "#app > div > div.container-0-16 > div.table-0-20 > div.tbody-0-21 > div:nth-child(2) > div:nth-child(8)")))

    get_css_sel("#app > div > div.container-0-16 > div.table-0-20 > div.tableheader-0-50")              #fetch header of table
    get_css_sel("#app > div > div.container-0-16 > div.table-0-20 > div.tbody-0-21 > div:nth-child(1)") #fetch values from 2nd row
    get_css_sel("#app > div > div.container-0-16 > div.table-0-20 > div.tbody-0-21 > div:nth-child(2)") #fetch values from 3rd row[...]
    [...]
    get_css_sel("#app > div > div.container-0-16 > div.table-0-20 > div.tbody-0-21 > div:nth-child(28)") #[...]fetch values from last row

EDIT 编辑

As for the row iteration, I thought something like this might do it: 至于行迭代,我认为可以这样做:

for row in rows(first, last):
    row += 1
    get_css_sel("#app > div > div.container-0-16 > div.table-0-20 > div.tbody-0-21 > div:nth-child(row)")

But I can't figure out how to express that into actual code. 但是我不知道如何将其表达为实际代码。

You could get manually the css path of the elements (you can do it in Chrome) and then just format the string with: 您可以手动获取元素的css路径(可以在Chrome中完成),然后使用以下命令格式化字符串:

get_css_sel("string({})".format(row))

For example: this image 例如: 这张图片

Hope this helps! 希望这可以帮助!

Edit: 编辑:

This is how you would iterate over it: 这是您要对其进行迭代的方式:

for row in range(first, last): #first is included but not last
    get_css_sel("#app > div > div.container-0-16 > div.table-0-20 > div.tbody-0-21 > div:nth-child({})".format(row)) #assuming path is well constructed
    row += 1

I think this will do. 我认为这可以。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM