简体   繁体   中英

How to map state code from one csv file to state name in another csv file in python?

I have one csv file, csv_file.csv , which had multiple records for each state and the state is identified with an id. The sample looks like:

state_id,year,value
01,2012,8.0
01,2012,8.1
01,2012,8.0
01,2012,7.7
01,2013,7.3
01,2013,7.0
01,2013,7.0

I want to convert the state_id in above dataset to corresponding state_name and write the records into another csv file, output.csv , so that for each state all the value fields come in one row and the output becomes:

Alabama,8.0,8.1,8.0,7.7,7.3,7.0,7.0
Alaska,8.1,8.1,8.0,7.4,7.25,7.6,7.5

For doing the mapping I have another csv file, state.csv , with the mapping details:

state_id,state_name
01,Alabama
02,Alaska
04,Arizona
05,Arkansas
06,California
08,Colorado
09,Connecticut

I wrote this code but this only seems to convert only 4 records(the top 4 records for state_id 01 and year 2012 ) of the csv_file.csv as when I open the Output.csv I see only 4 records and that too for them the value field is repeated. My current code is:

reader_csv = csv.reader(open('csv_file.csv', 'rb'))
reader_state = csv.reader(open('states.csv', 'rb'))
file_write = open('Output.csv', 'a')
writer = csv.writer(file_write)

for line in reader_csv:
    for states in reader_state:
        if line[0] == states[0]:
           print line[0]+'='+states[1]
           writer.writerow([states[1]]+[line[1]]+[line[2]])
           break

file_write.close()

What is the mistake I am doing here and how can I do the mapping to change state_id to state_name ?

Here is my approach: For the state.csv , convert that into a look-up dictionary, then read the input, translate, write:

import csv

with open('state.csv', 'rb') as f:
    id2name = dict(csv.reader(f))

with open('csv_file.csv', 'rb') as ifile, open('output.', 'wb') as ofile:
    reader = csv.reader(ifile)
    writer = csv.writer(ofile)

    for state_id, year, value in reader:
        state = id2name[state_id]
        writer.writerow([state, year, value])

Update

Update the code to write all the values on the same line. This solution makes use of the itertools.groupby function, which we group the records by the first field. The output will not have the header.

import csv
from itertools import groupby

with open('state.csv', 'rb') as f:
    id2name = dict(csv.reader(f))

with open('csv_file.csv', 'rb') as ifile, open('output.csv', 'wb') as ofile:
    reader = csv.reader(ifile)
    next(reader)  # skip the header
    writer = csv.writer(ofile)

    # Group by the state_id, which is the first field (record[0])
    group_by_state_id = groupby(reader, lambda record: record[0])
    for state_id, record_group in group_by_state_id:
        state = id2name[state_id]
        values = [value for state_id, year, value in record_group]
        writer.writerow([state] + values)

Update 2

If your system has sqlite3 installed (My Mac comes with it pre-installed), then the following script will get the desired result. Be sure to remove the headers from your csv files.

-- script.sql

.mode csv

CREATE TABLE state (sid TEXT, name TEXT);
.import state.csv state

CREATE TABLE raw (sid TEXT, year INT, value REAL);
.import csv_file.csv raw

SELECT state.name, group_concat(raw.value)
FROM state, raw
WHERE state.sid = raw.sid
GROUP BY state.name;

To use it:

$ sqlite3 < script.sql > output.csv

You should store your states unique identifiers in a dictionary. Then, access the values of that object for each line of csv_file.csv .

import csv

reader_csv = csv.reader(open('csv_file.csv', 'r')) # no b flag for python3
file_write = open('output.csv', 'a')
writer = csv.writer(file_write)

# Dictionary construction
with open('states.csv', mode='r') as infile:
    reader = csv.reader(infile)
    states_dict = {rows[0]:rows[1] for rows in reader}

# File writing
for line in reader_csv:
    writer.writerow([states_dict[line[0]]]+[line[1]]+[line[2]])
file_write.close()
import csv

with open('state.csv') as csvfile:
    reader = csv.DictReader(csvfile)
    states = {row.get('state_id'): row.get('state_name') for row in reader}

with open('csv_file.csv') as csvfile:
    reader = csv.DictReader(csvfile)

    with open('output.csv', 'wb') as outfile:
        fieldnames = ['state_name', 'year', 'value']
        writer = csv.DictWriter(outfile, fieldnames=fieldnames)
        writer.writeheader()

        for row in reader:
            writer.writerow({'state_name': states.get(row.get('state_id')), 'year': row.get('year'), 'value': row.get('value')})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM