简体   繁体   中英

How to read from an excel sheet using pythons xlrd module

I have the following code. What am I trying to do is screenscrape a website and then write the data to an excel worksheet. I can't read the existing data from excel file.

import xlwt
import xlrd
from xlutils.copy import copy
from datetime import datetime
import urllib.request
from bs4 import BeautifulSoup
import re
import time
import os  
links= open('links.txt', encoding='utf-8')
#excel workbook
if os.path.isfile('./TestSheet.xls'):
    rbook=xlrd.open_workbook('TestSheet.xls',formatting_info=True)
    book=copy(rbook)
else:
    book = xlwt.Workbook()

try:
    book.add_sheet("wayanad")
except:
    print("sheet exists")
    sheet=book.get_sheet(1)

for line in links:
    print("Currently Scanning\n","\n=================\n",line.rstrip())
    url=str(line.rstrip())    
    req = urllib.request.Request(url, headers={'User-Agent': 'Mozilla/5.0'})
    html = urllib.request.urlopen(req)
    soup = BeautifulSoup(html,"html.parser")
    #print(soup.prettify())
    title=soup.find('h1').get_text()    
    data=[]
    for i in soup.find_all('p'):
       data.append(i.get_text())
    quick_descr=data[1].strip()
    category=data[2].strip()
    tags=data[3].strip()
    owner=data[4].strip()
    website=data[6].strip()
    full_description=data[7]
    address=re.sub('\s+', ' ', soup.find('h3').get_text()).strip()
    city=soup.find(attrs={"itemprop": "addressRegion"}).get_text().strip()
    postcode=soup.find(attrs={"itemprop": "postalCode"}).get_text().strip()
    phone=[]
    result=soup.findAll('h4')
    for h in result:
        if h.has_attr('itemprop'):
            phone.append(re.sub("\D", "", h.get_text()))

    #writing data to excel
    row=sheet.last_used_row
    column_count=sheet.ncols()    
    book.save("Testsheet.xls")
    time.sleep(2)           

The code explained

  • I have a links file there are many links line by line. So pick a line(URL) and go that URL and scrape the data.
  • Open an excel workbook and switch to a sheet for writing data.
  • append the data to excel sheet.->>

Screenshot of execl sheet structure 在此处输入图片说明

Currently the list is empty. But i want to continue from the last row. I coudn't read data from the cell. The documentation says there is sheet.ncols is avilable to count the columns. But it throws an error

>>>column_count=sheet.ncols()
>>>AttributeError: 'Worksheet' object has no attribute 'ncols'

What i wanted is a way to count rows and columns, and read the data from cell. Many turials are old. Now i am using python 3.4. I've already gone through this links and many other. But no luck

Stack overflow

Stackoverdlow

Is that what you are looking for ? Going through all col.?

xl_workbook = xlrd.open_workbook

num_cols = xl_sheet.ncols
for row_idx in range(0, xl_sheet.nrows):

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM