简体   繁体   English

如何使用pythons xlrd模块从Excel工作表中读取

[英]How to read from an excel sheet using pythons xlrd module

I have the following code. 我有以下代码。 What am I trying to do is screenscrape a website and then write the data to an excel worksheet. 我想做的是屏蔽网站,然后将数据写到Excel工作表中。 I can't read the existing data from excel file. 我无法从excel文件中读取现有数据。

import xlwt
import xlrd
from xlutils.copy import copy
from datetime import datetime
import urllib.request
from bs4 import BeautifulSoup
import re
import time
import os  
links= open('links.txt', encoding='utf-8')
#excel workbook
if os.path.isfile('./TestSheet.xls'):
    rbook=xlrd.open_workbook('TestSheet.xls',formatting_info=True)
    book=copy(rbook)
else:
    book = xlwt.Workbook()

try:
    book.add_sheet("wayanad")
except:
    print("sheet exists")
    sheet=book.get_sheet(1)

for line in links:
    print("Currently Scanning\n","\n=================\n",line.rstrip())
    url=str(line.rstrip())    
    req = urllib.request.Request(url, headers={'User-Agent': 'Mozilla/5.0'})
    html = urllib.request.urlopen(req)
    soup = BeautifulSoup(html,"html.parser")
    #print(soup.prettify())
    title=soup.find('h1').get_text()    
    data=[]
    for i in soup.find_all('p'):
       data.append(i.get_text())
    quick_descr=data[1].strip()
    category=data[2].strip()
    tags=data[3].strip()
    owner=data[4].strip()
    website=data[6].strip()
    full_description=data[7]
    address=re.sub('\s+', ' ', soup.find('h3').get_text()).strip()
    city=soup.find(attrs={"itemprop": "addressRegion"}).get_text().strip()
    postcode=soup.find(attrs={"itemprop": "postalCode"}).get_text().strip()
    phone=[]
    result=soup.findAll('h4')
    for h in result:
        if h.has_attr('itemprop'):
            phone.append(re.sub("\D", "", h.get_text()))

    #writing data to excel
    row=sheet.last_used_row
    column_count=sheet.ncols()    
    book.save("Testsheet.xls")
    time.sleep(2)           

The code explained 代码说明

  • I have a links file there are many links line by line. 我有一个链接文件,逐行有很多链接。 So pick a line(URL) and go that URL and scrape the data. 因此,选择一行(URL)并转到该URL并抓取数据。
  • Open an excel workbook and switch to a sheet for writing data. 打开一个excel工作簿,然后切换到工作表以写入数据。
  • append the data to excel sheet.->> 将数据附加到Excel工作表.- >>

Screenshot of execl sheet structure execl工作表结构的屏幕截图 在此处输入图片说明

Currently the list is empty. 当前列表为空。 But i want to continue from the last row. 但我想从最后一行继续。 I coudn't read data from the cell. 我无法从该单元读取数据。 The documentation says there is sheet.ncols is avilable to count the columns. 文档说有工作表sheet.ncols可以计算列数。 But it throws an error 但这会引发错误

>>>column_count=sheet.ncols()
>>>AttributeError: 'Worksheet' object has no attribute 'ncols'

What i wanted is a way to count rows and columns, and read the data from cell. 我想要的是一种计数行和列并从单元格读取数据的方法。 Many turials are old. 许多葬礼是古老的。 Now i am using python 3.4. 现在我正在使用python 3.4。 I've already gone through this links and many other. 我已经通过这个链接和许多其他链接。 But no luck 但是没有运气

Stack overflow 堆栈溢出

Stackoverdlow Stackoverdlow

Is that what you are looking for ? 那是您要找的东西吗? Going through all col.? 经历所有上校?

xl_workbook = xlrd.open_workbook

num_cols = xl_sheet.ncols
for row_idx in range(0, xl_sheet.nrows):

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM