简体   繁体   中英

Issues with outputting the scraped data to a csv file using python and beautiful soup

I am trying to output the scrapped data from a website into a csv file, first I was coming across UnicodeEncoding error but after using this piece of code:

if __name__ == "__main__":
reload(sys)
sys.setdefaultencoding("utf-8")

I am able to generate the csv, below is the code for the same:

import csv
import urllib2
import sys  
from bs4 import BeautifulSoup
if __name__ == "__main__":
    reload(sys)
    sys.setdefaultencoding("utf-8")
page =    urllib2.urlopen('http://www.att.com/shop/wireless/devices/smartphones.html').read()
soup = BeautifulSoup(page)
soup.prettify()
for anchor in soup.findAll('a', {"class": "clickStreamSingleItem"}):
        print anchor['title']        
        with open('Smartphones.csv', 'wb') as csvfile:
                spamwriter = csv.writer(csvfile, delimiter=',')        
                spamwriter.writerow([(anchor['title'])])     

But I am getting only one device name in the output csv, I don't have any programming background, pardon me for the ignorance. Can you please help me pinpoint the issue in this?

That's to be expected; you write the file from scratch each time you find an element. Open the file only once before looping over the links, then write rows for each anchor you find:

with open('Smartphones.csv', 'wb') as csvfile:
    spamwriter = csv.writer(csvfile, delimiter=',')        
    for anchor in soup.findAll('a', {"class": "clickStreamSingleItem"}):
        print anchor['title']        
        spamwriter.writerow([anchor['title'].encode('utf8')])   

Opening a file for writing with w clears the file first, and you were doing that for each anchor.

As for your unicode error, please avoid, at all cost, changing the default encoding. Instead, encode your rows properly; I did so in the above example, you can remove the whole .setdefaultencoding() call (and the reload() before it).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM