简体   繁体   中英

How to write to an existing excel file without over-writing existing data using pandas

I know similar questions have been posted before, but i haven't found something working for this case. I hope you can help.

Here is a summary of the issue:

  1. I'am writing a web scraping code using selenium(for an assignment purpose)
  2. The code utilizes a for-loop to go from one page to another
  3. The output of the code is a dataframe from each page number that is imported to excel. (basically a table)
  4. Dataframes from all the web pages to be captured in one excel sheet only.(not multiple sheets within the excel file)
  5. Each web page has the same data format (ie. number of columns and column headers are the same, but the row values vary..)
  6. For info, I'am using pandas as it is helping convert the output from the website to excel

The problem i'm facing is that when the dataframe is exported to excel, it over-writes the data from the previous iteration. hence, when i run the code and scraping is completed, I will only get the data from the last for-loop iteration.

Please advise the line(s) of coding i need to add in order for all the iterations to be captured in the excel sheet, in other words and more specifically, each iteration should export the data to excel starting from the first empty row.

Here is an extract from the code:

for i in range(50, 60):  
    url= (urlA + str(i)) #this is the url generator, URLA is the main link excluding pagination

    driver.get(url)

    time.sleep(random.randint(3,7))

    text=driver.find_element_by_xpath('/html/body/pre').text

    data=pd.DataFrame(eval(text))

    export_excel = data.to_excel(xlpath)

Thanks Dijkgraaf. Your proposal worked.

Here is the full code for others (for future reference).

apologies for the font, couldnt set it properly. anyway hope below is to some use for someone in the future.

xlpath= "c:/projects/excelfile.xlsx"

df=pd.DataFrame() #creating a data frame before the for loop. (dataframe is empty before the for loop starts)

Url= www.your website.com 

for i in irange(1,10): 

       url= (urlA + str(i)) #this is url generator for pagination (to loop thru the page) 

       driver.get(url)  

       text=driver.find_element_by_xpath('/html/body/pre').text # gets text from site

       data=pd.DataFrame(eval(text)) #evalues the extracted text from site and converts to Pandas dataframe 

       df=df.append(data) #appends the dataframe (df) specificed before the for-loop and adds the new (data)

export_excel = df.to_excel(xlpath)  #exports consolidated dataframes (df) to excel

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM