簡體   English   中英

如何將抓取的數據導出到 csv - selenium

[英]how can I export scraped data to csv - selenium

我有 2 個從 google 抓取數據的函數,但我不確定如何將我的結果導出到帶有標題和鏈接列的 csv 文件中。 你能幫我解決這個問題嗎?

def get_search_attributes(driver):

    headers = driver.find_elements_by_xpath('//*[@id="rso"]/div/div/div/div/a/h3')
    headers = [header.text for header in headers]
    # print(headers)

    links = driver.find_elements_by_xpath('//*[@id="rso"]/div/div/div/div/a')
    links = [link.get_attribute('href') for link in links]
    #print(links)

    headers_df = pd.DataFrame(headers, columns=["headers"])
    links_df = pd.DataFrame(links, columns=["links"])

    return headers_df, links_df




def search_multiple_pages(driver, page_limit = 5):

    insert_search_value(driver)

    pagecounter = 0

    while pagecounter <= page_limit:
        get_search_attributes(driver)
        next_page_btn = driver.find_elements_by_xpath("//a[@id='pnnext']")
        if len(next_page_btn) < 1:
            print('no more pages')
            break
        else:
            element = WebDriverWait(driver, 5).until(expected_conditions.element_to_be_clickable((By.ID, 'pnnext')))
            driver.execute_script("return arguments[0].scrollIntoView();", element)
            element.click()
            pagecounter += 1
    return
header_csv = headers_df.to_csv(..optional args)
links_csv = links_df.to_csv(..optional args)

f = open("filename.csv", "a")
f.write(header_csv)
f.write(links_csv) // or any order
f.close()

有關詳細信息,請參閱panda to_csv docpython 編寫函數

您應該使用字典將所有內容放在一個 dataframe 中

#headers_df = pd.DataFrame(headers, columns=["headers"])
#links_df = pd.DataFrame(links, columns=["links"])

df = pd.DataFrame({"headers": headers, "links": links})

df.to_csv(filename)

例子

import pandas as pd

df = pd.DataFrame({
    "Headers": ['A', 'B', 'C'], 
    "Links": ['https://A', 'https://B', 'https://C']
})

print(df)

df.to_csv('data.csv')

結果:

  Headers      Links
0       A  https://A
1       B  https://B
2       C  https://C

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM