简体   繁体   English

使用python和漂亮的汤将抓取的数据输出到csv文件的问题

[英]Issues with outputting the scraped data to a csv file using python and beautiful soup

I am trying to output the scrapped data from a website into a csv file, first I was coming across UnicodeEncoding error but after using this piece of code: 我试图将网站中的报废数据输出到一个csv文件中,首先我遇到了UnicodeEncoding错误,但是在使用了这段代码之后:

if __name__ == "__main__":
reload(sys)
sys.setdefaultencoding("utf-8")

I am able to generate the csv, below is the code for the same: 我能够生成csv,下面是相同的代码:

import csv
import urllib2
import sys  
from bs4 import BeautifulSoup
if __name__ == "__main__":
    reload(sys)
    sys.setdefaultencoding("utf-8")
page =    urllib2.urlopen('http://www.att.com/shop/wireless/devices/smartphones.html').read()
soup = BeautifulSoup(page)
soup.prettify()
for anchor in soup.findAll('a', {"class": "clickStreamSingleItem"}):
        print anchor['title']        
        with open('Smartphones.csv', 'wb') as csvfile:
                spamwriter = csv.writer(csvfile, delimiter=',')        
                spamwriter.writerow([(anchor['title'])])     

But I am getting only one device name in the output csv, I don't have any programming background, pardon me for the ignorance. 但是我在输出的csv中仅获得一个设备名称,我没有任何编程背景,请原谅我的无知。 Can you please help me pinpoint the issue in this? 您能帮我找出问题所在吗?

That's to be expected; 这是意料之中的; you write the file from scratch each time you find an element. 您每次找到一个元素都从头开始编写文件。 Open the file only once before looping over the links, then write rows for each anchor you find: 在循环浏览链接之前,仅打开文件一次 ,然后为找到的每个锚写行:

with open('Smartphones.csv', 'wb') as csvfile:
    spamwriter = csv.writer(csvfile, delimiter=',')        
    for anchor in soup.findAll('a', {"class": "clickStreamSingleItem"}):
        print anchor['title']        
        spamwriter.writerow([anchor['title'].encode('utf8')])   

Opening a file for writing with w clears the file first, and you were doing that for each anchor. 使用w打开文件进行写入会首先清除该文件,而您正在对每个锚点进行操作。

As for your unicode error, please avoid, at all cost, changing the default encoding. 至于您的unicode错误,请不惜一切代价避免更改默认编码。 Instead, encode your rows properly; 相反,对行进行正确编码; I did so in the above example, you can remove the whole .setdefaultencoding() call (and the reload() before it). 在上面的示例中,我这样做了,您可以删除整个.setdefaultencoding()调用(以及之前的reload() )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 Python 3 和 Beautiful Soup 4 删除 HTML 标签并将抓取的数据保存到 CSV 文件 - Remove HTML tags and save scraped data to CSV file using Python 3 and Beautiful Soup 4 使用 Beautiful Soup 从抓取的数据中写入 CSV 文件 - Write CSV file from scraped data with Beautiful Soup 美丽的汤没有提供正确的 csv 文件抓取数据 - beautiful soup not providing a proper csv file of scraped data Python,漂亮汤,如何提取数据并打印到CSV文件 - Python, Beautiful soup, how to extract data and print to csv file 将python中的多个抓取文件从漂亮的汤导出到cvs文件 - Export multiple scraped files in python from beautiful soup to a cvs file 使用漂亮的汤从网络抓取的Wikipedia页面html文件中提取相关数据时遇到问题 - Trouble extracting relevant data from a web scraped wikipedia page html file using beautiful soup 使用 Beautiful Soup Python 的网页抓取表格:组织表格? - Web Scraped Table Using Beautiful Soup Python: Organizing Table? 使用漂亮的汤将剪贴的数据循环到网站的不同页面 - Looping Scraped Data Through Different Pages of a Website Using Beautiful Soup Python html 解析使用漂亮的汤问题 - Python html parsing using beautiful soup issues python将漂亮的汤数据解析为csv - python parsing beautiful soup data to csv
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM