简体   繁体   中英

how can I export scraped data to csv - selenium

I have 2 functions that are scraping data from google but I am not sure how to export my results into csv file with headers and links columns. Could you please help me with that?

def get_search_attributes(driver):

    headers = driver.find_elements_by_xpath('//*[@id="rso"]/div/div/div/div/a/h3')
    headers = [header.text for header in headers]
    # print(headers)

    links = driver.find_elements_by_xpath('//*[@id="rso"]/div/div/div/div/a')
    links = [link.get_attribute('href') for link in links]
    #print(links)

    headers_df = pd.DataFrame(headers, columns=["headers"])
    links_df = pd.DataFrame(links, columns=["links"])

    return headers_df, links_df




def search_multiple_pages(driver, page_limit = 5):

    insert_search_value(driver)

    pagecounter = 0

    while pagecounter <= page_limit:
        get_search_attributes(driver)
        next_page_btn = driver.find_elements_by_xpath("//a[@id='pnnext']")
        if len(next_page_btn) < 1:
            print('no more pages')
            break
        else:
            element = WebDriverWait(driver, 5).until(expected_conditions.element_to_be_clickable((By.ID, 'pnnext')))
            driver.execute_script("return arguments[0].scrollIntoView();", element)
            element.click()
            pagecounter += 1
    return
header_csv = headers_df.to_csv(..optional args)
links_csv = links_df.to_csv(..optional args)

f = open("filename.csv", "a")
f.write(header_csv)
f.write(links_csv) // or any order
f.close()

see panda to_csv doc and python write func for detailing

You should put all in one dataframe using dictionary

#headers_df = pd.DataFrame(headers, columns=["headers"])
#links_df = pd.DataFrame(links, columns=["links"])

df = pd.DataFrame({"headers": headers, "links": links})

df.to_csv(filename)

Example

import pandas as pd

df = pd.DataFrame({
    "Headers": ['A', 'B', 'C'], 
    "Links": ['https://A', 'https://B', 'https://C']
})

print(df)

df.to_csv('data.csv')

Result:

  Headers      Links
0       A  https://A
1       B  https://B
2       C  https://C

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM