I'm new to python and made a simple scraper that will log into several analytics accounts and print some data to a CSV. The format I'm printing to CSV in is a dictionary that I create with the following code:
import csv
from collections import OrderedDict
import time
def save_file(website, visitors, links, sources):
date = time.strftime("%d/%m/%Y")
d = OrderedDict()
d['Title'] = website # website string
d['Date'] = date # date string
d['Vistors'] = visitors # integer
d['Links'] = links # dictionary of links - URL : Clicks
d['Sources'] = sources # dictionary of sources - Source: Clicks
path = os.path.expanduser('~/Desktop/Traffic Report.csv')
with open(path, 'a') as f:
writer = csv.DictWriter(f, d, delimiter=',')
writer.writerow(d)
When I print to CSV using this code, the site, date, and visitors cells work great. The links/source cells (data I'm using beautifulsoup to scrape) are full of extra quotation marks and characters as seen below.
{"['www.example1.com/']": '1', "['www.example2.com']": '1', "['www.example3.com']": '1', "['www.example4.com/']": '3', "['www.example5.com/']": '1'}
{"['Links']": '2', "['Social media']": '5', "['Direct']": '2', "['Searches']": '1'}
Is there any way to remove many of these characters and print to csv as: www.example1.com : 1, www.example2.com : 1, www.example3.com : 1...
Any help would be greatly appreciated!
You'd have to do the formatting yourself. Instead of a dictionary, build a string:
d['Links'] = ', '.join(['{}: {}'.format(*item) for item in links.items()])
d['Sources'] = ', '.join(['{}: {}'.format(*item) for item in sources.items()])
This produces link1: count1, link2: count2
results.
As a sidenote, you don't need to use an OrderedDict
object here, just give the DictWriter
a sequence of keys in the order you want them written instead. I'd also open the CSV file just once outside the loop:
d = {
'Title': website,
'Date': date,
'Visitors': visitors,
'Links': ', '.join(['{}: {}'.format(*item) for item in links.items()]),
'Sources': ', '.join(['{}: {}'.format(*item) for item in sources.items()],
}
path = os.path.expanduser('~/Desktop/Traffic Report.csv')
with open(path, 'a') as f:
fields = ('Title', 'Date', 'Visitors', 'Links', 'Sources')
writer = csv.DictWriter(f, fields, delimiter=',')
writer.writerow(d)
def convert(dct):
return ", ".join("%s : %s" % (key, value) for key, value in dct.iteritems())
(use .items()
instead of .iteritems()
if Python3.x) and then
d['Links'] = convert(links)
d['Sources'] = convert(sources)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.