使用python進行網絡抓取並將數據傳輸到excel中

Question

我能夠完全廢棄我需要的材料，問題是我無法將數據導入 excel。

from lxml import html
import requests
import xlsxwriter

page = requests.get('website that gets mined')
tree = html.fromstring(page.content)




items = tree.xpath('//h4[@class="item-title"]/text()')
prices = tree.xpath('//span[@class="price"]/text()')
description = tree.xpath('//div[@class="description text"]/text()')
print 'items: ', items
print 'Prices: ', prices
print 'description', description

一切正常，直到本節我嘗試將數據導入 excel 這是錯誤消息：

for items,prices,description in (array):
ValueError: too many values to unpack
Exception Exception: Exception('Exception caught in workbook destructor. Explicit close() may be required for workbook.',) in <bound method Workbook.__del__ of <xlsxwriter.workbook.Workbook object at 0x104735e10>> ignored

這就是它試圖做的

array = [items,prices,description]
workbook   = xlsxwriter.Workbook('test1.xlsx')
worksheet = workbook.add_worksheet()
row = 0
col = 0

for items,prices,description in (array):
    worksheet.write(row, col, items)
    worksheet.write(row, col + 1, prices)
    worksheet.write(row, col + 2, description)
    row += 1
workbook.close()

Answer 1

假設“items,prices,description”都具有相同的長度，你可以重寫代碼的最后部分：

for item,price,desc in zip(items,prices,description)
    worksheet.write(row, col, item)
    worksheet.write(row, col + 1, price)
    worksheet.write(row, col + 2, desc)
    row += 1

如果列表可以有不同的長度，你應該檢查這為替代zip方法，但我擔心的數據一致性。

Answer 2

不可避免地，寫入 CSV 文件或文本文件會比 Excel 文件更容易。

import urllib2

listOfStocks = ["AAPL", "MSFT", "GOOG", "FB", "AMZN"]

urls = []

for company in listOfStocks:
    urls.append('http://real-chart.finance.yahoo.com/table.csv?s=' + company + '&d=6&e=28&f=2015&g=m&a=11&b=12&c=1980&ignore=.csv')

Output_File = open('C:/your_path_here/Data.csv','w')

New_Format_Data = ''

for counter in range(0, len(urls)):

    Original_Data = urllib2.urlopen(urls[counter]).read()

    if counter == 0:
        New_Format_Data = "Company," + urllib2.urlopen(urls[counter]).readline()

    rows = Original_Data.splitlines(1)

    for row in range(1, len(rows)):

        New_Format_Data = New_Format_Data + listOfStocks[counter] + ',' + rows[row]

Output_File.write(New_Format_Data)
Output_File.close()

或者

from bs4 import BeautifulSoup
import urllib2

var_file = urllib2.urlopen("http://www.imdb.com/chart/top")

var_html  = var_file.read()

text_file = open("C:/your_path_here/Text1.txt", "wb")
var_file.close()
soup = BeautifulSoup(var_html)
for item in soup.find_all(class_='lister-list'):
    for link in item.find_all('a'):
        #print(link)
        z = str(link)
        text_file.write(z + "\r\n")
text_file.close()

作為開發人員，很難以編程方式操作 Excel 文件，因為 Excel 是專有的。 對於 .NET 以外的語言尤其如此。 另一方面，對於開發人員來說，以編程方式操作 CSV 很容易，因為它們畢竟是簡單的文本文件。

使用python進行網絡抓取並將數據傳輸到excel中

問題描述

2 個解決方案

解決方案1
2 已采納 2018-02-19 23:08:52

解決方案2
0 2018-02-21 03:40:37

使用python進行網絡抓取並將數據傳輸到excel中

問題描述

2 個解決方案

解決方案1 2 已采納 2018-02-19 23:08:52

解決方案2 0 2018-02-21 03:40:37

解決方案1
2 已采納 2018-02-19 23:08:52

解決方案2
0 2018-02-21 03:40:37