I'm working on a python script. Most of my data are recorded in a vertical model and I want to have them in a horizontal.
here is my example of the data I've
ID,Identifier,Value
1_UK,City,Paris
1_UK,Number of the departments,75
1_UK,Department,Ile de France
1_UK,Habitant,12405426hab
2_UK,City,Ajaccio
2_UK,Number of the departments,2A
2_UK,Department,Corse du Sud
And here is where I want to go:
ID, City, Number of the departments, Department, Habitant
1_UK, Paris, 75, Ile de France, 12405426hab
2_UK, Ajaccio, 2A, Corse du sud,''
To read a CSV file in Python is not difficult. Where I'm lost is I've 4 identifier (city, number of the departments, department and habitant) the ID 2_UK doesn't have a value for habitant. And I don't know how to represent that in my code.
import csv
csvfile = open ("Exercice1.csv",'r',encoding='utf-8')
IDs=[]
identifiers=[]
uniqueIDs=[]
uniqueidentifiers=[]
reader=csv.reader(csvfile)
for row in reader:
IDs.append(ID)
identifiers.append(identifier)
csvfile.close()
#remove duplicate value and keep order as is it.
for i in IDs:
if i not in uniqueIDs:
uniqueIDs.append(i)
for i in identifiers:
if i not in uniqueidentifiers:
uniqueidentifiers.append(i)
And then I'm lost the function zip seems to not answer to my needs or I don't use it properly.
Happy to listen your advice.
Thank you!
It's easy using pandas
. You can import your .csv
file into a DataFrame df
and then using pivot
:
In [10]: d = df.pivot(index='ID', columns='Identifier', values='Value')
In [11]: d
Out[11]:
Identifier City Department Habitant Number of the departments
ID
1_UK Paris Ile de France 12405426hab 75
2_UK Ajaccio Corse du Sud None 2A
You could do something along the lines of:
import csv
cities = {}
with open('Exercice1.csv', 'r') as f:
reader = csv.DictReader(f)
for d in reader:
new_dict = {d['Identifier']: d['Value'], 'ID': d['ID']}
try:
cities[d['ID']] = {**cities[d['ID']], **new_dict}
except KeyError:
cities[d['ID']] = {**new_dict}
with open('output.csv', 'w') as f:
field_names = ['ID', 'City', 'Number of the departments', 'Department', 'Habitant']
writer = csv.DictWriter(f, fieldnames=field_names, lineterminator='\n', restval='')
writer.writeheader()
for k, v in cities.items():
writer.writerow(v)
Using your data this gives me:
ID,City,Number of the departments,Department,Habitant
1_UK,Paris,75,Ile de France,12405426hab
2_UK,Ajaccio,2A,Corse du Sud,
The restval
argument in csv.DictWriter
is what is inserted in a row if the dict provided doesn't have a key from the field_names
list. I just used an empty string, you could replace it with whatever you like.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.