简体   繁体   English

如何使用 pandas 写入现有 excel 文件而不覆盖现有数据

[英]How to write to an existing excel file without over-writing existing data using pandas

I know similar questions have been posted before, but i haven't found something working for this case.我知道之前已经发布过类似的问题,但我还没有找到适合这种情况的东西。 I hope you can help.我希望你能帮忙。

Here is a summary of the issue:以下是问题的摘要:

  1. I'am writing a web scraping code using selenium(for an assignment purpose)我正在使用硒编写 web 抓取代码(用于分配目的)
  2. The code utilizes a for-loop to go from one page to another该代码利用一个for循环将go从一个页面转移到另一个页面
  3. The output of the code is a dataframe from each page number that is imported to excel.代码的 output 是 dataframe 从每个页码导入到 excel。 (basically a table) (基本上是一张桌子)
  4. Dataframes from all the web pages to be captured in one excel sheet only.(not multiple sheets within the excel file)来自所有 web 页面的数据帧仅在一张 excel 表中捕获。(excel 文件中的多张表不)
  5. Each web page has the same data format (ie. number of columns and column headers are the same, but the row values vary..)每个 web 页面具有相同的数据格式(即列数和列标题相同,但行值不同..)
  6. For info, I'am using pandas as it is helping convert the output from the website to excel有关信息,我正在使用 pandas 因为它帮助将 output 从网站转换为 excel

The problem i'm facing is that when the dataframe is exported to excel, it over-writes the data from the previous iteration.我面临的问题是,当 dataframe 导出到 excel 时,它会覆盖上一次迭代的数据。 hence, when i run the code and scraping is completed, I will only get the data from the last for-loop iteration.因此,当我运行代码并完成抓取时,我只会从最后一次 for 循环迭代中获取数据。

Please advise the line(s) of coding i need to add in order for all the iterations to be captured in the excel sheet, in other words and more specifically, each iteration should export the data to excel starting from the first empty row.请告知我需要添加的编码行,以便在 excel 表中捕获所有迭代,换句话说,更具体地说,每次迭代都应从第一个空行开始将数据导出到 excel。

Here is an extract from the code:这是代码的摘录:

for i in range(50, 60):  
    url= (urlA + str(i)) #this is the url generator, URLA is the main link excluding pagination

    driver.get(url)

    time.sleep(random.randint(3,7))

    text=driver.find_element_by_xpath('/html/body/pre').text

    data=pd.DataFrame(eval(text))

    export_excel = data.to_excel(xlpath)

Thanks Dijkgraaf.谢谢迪克格拉夫。 Your proposal worked.你的提议奏效了。

Here is the full code for others (for future reference).这是其他人的完整代码(供将来参考)。

apologies for the font, couldnt set it properly.为字体道歉,无法正确设置。 anyway hope below is to some use for someone in the future.无论如何,希望下面对将来的某些人有用。

xlpath= "c:/projects/excelfile.xlsx"

df=pd.DataFrame() #creating a data frame before the for loop. (dataframe is empty before the for loop starts)

Url= www.your website.com 

for i in irange(1,10): 

       url= (urlA + str(i)) #this is url generator for pagination (to loop thru the page) 

       driver.get(url)  

       text=driver.find_element_by_xpath('/html/body/pre').text # gets text from site

       data=pd.DataFrame(eval(text)) #evalues the extracted text from site and converts to Pandas dataframe 

       df=df.append(data) #appends the dataframe (df) specificed before the for-loop and adds the new (data)

export_excel = df.to_excel(xlpath)  #exports consolidated dataframes (df) to excel

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM