简体   繁体   中英

How can I export my python web scrape data to a specific sheet in an existing excel file using pandas?

I have an Excel file with multiple sheets. I would like to add new data from Python to a new sheet in the same Excel file using pandas. Is this possible to do without affecting my previous data? I am new... Thanks for any help!

Here is the Python code I am using so far:

from urllib.request import urlopen
from lxml import html
import cssselect

response = urlopen("https://www.xyz.com.shtml")
content = response.read()
tree = html.fromstring(content)

for div in tree.cssselect('.first_name'):
for a in div.cssselect('table:nth-child(2) a'):
    print(a.text)

I found this online... I am just a little confused how to use it in my current situation:

import pandas as pd

df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})

writer = pd.ExcelWriter('pandas_simple.xlsx', engine='xlsxwriter')

df.to_excel(writer, sheet_name='Sheet1')

writer.save()

Seems like you want to scrape a website and grab table element inside a class. I would suggest you to use BeautifulSoup instead.

Steps

  1. grab your table inside class
  2. append your data in a dictionary
  3. convert to dataframe
  4. export dataframe to excel

     from lxml import html from bs4 import BeautifulSoup import requests import pandas req = requests.get('https://www.xyz.com.shtml') soup = BeautifulSoup(req.text, "lxml") content = soup.find("table:nth-child(2) a", {"class": "first_name"}) mydict = dict() for c in content: #from c append data in mydict # example: mydict['Data'].append(c.text) df = pd.DataFrame(mydict) writer = pd.ExcelWriter('pandas_simple.xlsx') df.to_excel(writer,'Sheet1') writer.save() 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM