I would like to read the following text file:
date candy
1/12/2011 300
1/20/2010 200
1/16/2010 200
into a list of dictionaries as follows:
candysales= [ {'date': d(2011,1,12), 'sales': 300}, {'date': d(2010,1,20), 'sales': 200},{'date': d(2010,1,16), 'sales': 200}]
Does anyone have any ideas of how to begin doing this, or any resources that I can look at?
You can use csv.DictReader
which will read a CSV file, using the first row as the dictionary key names, and parsing each row into a dictionary (you will lose field order in this case, as dictionaries are not reliably ordered). You can then convert the date from a string to a datetime.date
object using datetime.datetime
's strptime
method , and the converting to a date
:
candysales = []
for row in csv.DictReader(file('/path/to/sales.csv')):
row['date'] = datetime.strptime(row['date'], '%d/%m/%Y').date()
candysales.append(row)
Edit: I've just noticed that the input isn't CSV (it looks like a fixed-width format). The csv
module works with CSV files or tab-delimited files, but probably won't work well with this fixed-width format. If you can control the format of this file, CSV would be a good choice: if not, we can convert it using the re
module:
def csvify(iterable):
for line in utterable:
yield re.sub('\s+', ',', line.rstrip())
candysales = []
for row in csv.DictReader(csvify(file('/path/to/sales.csv'))):
row['date'] = datetime.strptime(row['date'], '%d/%m/%Y').date()
candysales.append(row)
The csvify
function returns a generator which is passed to the csv.DictReader
, which yields the lines from the underlying file by first replacing occurrences of one or more whitespace characters with a single comma, thus converting to CSV.
This probably won't serve as a general-purpose solution to converting fixed-width text formats to CSV, but it will work if the example you've given above is representative.
You can read the entire file in a string
data = fin.read()
Split based on lines
data=data.splitlines()
Use List comprehension like
[dict((('date',datetime.datetime.strptime(k,"%m/%d/%Y")),('sales',v)))
for (k,v) in [e.split() for e in data.splitlines()[1:]]]
which will give you a result like
[{'date': datetime.datetime(2011, 1, 12, 0, 0), 'sales': '300'}, {'date': datetime.datetime(2010, 1, 20, 0, 0), 'sales': '200'}, {'date': datetime.datetime(2010, 1, 16, 0, 0), 'sales': '200'}]
In case reading the entire file in memory is an issue for you, you can do the following
>>> candysales=[]
>>> fin.readline() # To Skip the First Line
for d in fin:
k,v=d.split()
candysales+=[dict((('date',datetime.datetime.strptime(k,"%m/%d/%Y")),('sales',v)))]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.