I have one csv
file, csv_file.csv
, which had multiple records for each state and the state is identified with an id. The sample looks like:
state_id,year,value
01,2012,8.0
01,2012,8.1
01,2012,8.0
01,2012,7.7
01,2013,7.3
01,2013,7.0
01,2013,7.0
I want to convert the state_id
in above dataset to corresponding state_name
and write the records into another csv
file, output.csv
, so that for each state all the value
fields come in one row and the output becomes:
Alabama,8.0,8.1,8.0,7.7,7.3,7.0,7.0
Alaska,8.1,8.1,8.0,7.4,7.25,7.6,7.5
For doing the mapping I have another csv
file, state.csv
, with the mapping details:
state_id,state_name
01,Alabama
02,Alaska
04,Arizona
05,Arkansas
06,California
08,Colorado
09,Connecticut
I wrote this code but this only seems to convert only 4 records(the top 4 records for state_id
01
and year
2012
) of the csv_file.csv
as when I open the Output.csv
I see only 4 records and that too for them the value
field is repeated. My current code is:
reader_csv = csv.reader(open('csv_file.csv', 'rb'))
reader_state = csv.reader(open('states.csv', 'rb'))
file_write = open('Output.csv', 'a')
writer = csv.writer(file_write)
for line in reader_csv:
for states in reader_state:
if line[0] == states[0]:
print line[0]+'='+states[1]
writer.writerow([states[1]]+[line[1]]+[line[2]])
break
file_write.close()
What is the mistake I am doing here and how can I do the mapping to change state_id
to state_name
?
Here is my approach: For the state.csv , convert that into a look-up dictionary, then read the input, translate, write:
import csv
with open('state.csv', 'rb') as f:
id2name = dict(csv.reader(f))
with open('csv_file.csv', 'rb') as ifile, open('output.', 'wb') as ofile:
reader = csv.reader(ifile)
writer = csv.writer(ofile)
for state_id, year, value in reader:
state = id2name[state_id]
writer.writerow([state, year, value])
Update the code to write all the values on the same line. This solution makes use of the itertools.groupby
function, which we group the records by the first field. The output will not have the header.
import csv
from itertools import groupby
with open('state.csv', 'rb') as f:
id2name = dict(csv.reader(f))
with open('csv_file.csv', 'rb') as ifile, open('output.csv', 'wb') as ofile:
reader = csv.reader(ifile)
next(reader) # skip the header
writer = csv.writer(ofile)
# Group by the state_id, which is the first field (record[0])
group_by_state_id = groupby(reader, lambda record: record[0])
for state_id, record_group in group_by_state_id:
state = id2name[state_id]
values = [value for state_id, year, value in record_group]
writer.writerow([state] + values)
If your system has sqlite3
installed (My Mac comes with it pre-installed), then the following script will get the desired result. Be sure to remove the headers from your csv files.
-- script.sql
.mode csv
CREATE TABLE state (sid TEXT, name TEXT);
.import state.csv state
CREATE TABLE raw (sid TEXT, year INT, value REAL);
.import csv_file.csv raw
SELECT state.name, group_concat(raw.value)
FROM state, raw
WHERE state.sid = raw.sid
GROUP BY state.name;
To use it:
$ sqlite3 < script.sql > output.csv
You should store your states unique identifiers in a dictionary. Then, access the values of that object for each line of csv_file.csv
.
import csv
reader_csv = csv.reader(open('csv_file.csv', 'r')) # no b flag for python3
file_write = open('output.csv', 'a')
writer = csv.writer(file_write)
# Dictionary construction
with open('states.csv', mode='r') as infile:
reader = csv.reader(infile)
states_dict = {rows[0]:rows[1] for rows in reader}
# File writing
for line in reader_csv:
writer.writerow([states_dict[line[0]]]+[line[1]]+[line[2]])
file_write.close()
import csv
with open('state.csv') as csvfile:
reader = csv.DictReader(csvfile)
states = {row.get('state_id'): row.get('state_name') for row in reader}
with open('csv_file.csv') as csvfile:
reader = csv.DictReader(csvfile)
with open('output.csv', 'wb') as outfile:
fieldnames = ['state_name', 'year', 'value']
writer = csv.DictWriter(outfile, fieldnames=fieldnames)
writer.writeheader()
for row in reader:
writer.writerow({'state_name': states.get(row.get('state_id')), 'year': row.get('year'), 'value': row.get('value')})
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.