I am given a CSV that has two issues that is provided by a third party, out of my control to.
Different CSV will have different Similar Names
CSV File A
File Name,Column2[en],Column2[us],isPartOf,isPartOf
file1.tif,English,US English,USA,North America
file2.tif,English,,USA,
file3.tif,,US,,North America
CSV File B
File Name,Column2[fr],Column2[en],isPartOf,isPartOf
Is it possible using csv.DictReader
to use startswith()
to read multiple columns? Or do I need to create a map of the header row and map them separately before reading the CSV with DictReader
?
Is it possible to read both to load the data from both columns with the same name? I know you can do something with dataframes in pandas, but I am not allowed to use Pandas.
#!/bin/env python3
import csv
with open("./test.csv") as csv_file:
csv_reader = csv.DictReader(csv_file, delimiter=',')
for row in csv_reader:
print(row["isPartOf"],row["isPartOf"])
I run this using:
$ ./csvReader.py
North America North America
North America
You could create a class which uses csv.reader
to read the first line, use its column names to figure out how to handle duplicate columns, and then yield rows as dictionaries when iterated over. This example groups all columns by name, and if multiple columns have the same name, returns a tuple containing all the column values in the dictionary
import csv
import collections
class DuplicateColumnDictReader:
def __init__(self, iterable, dialect='excel', **kwargs):
self.reader = csv.reader(iterable, dialect, **kwargs)
self.header = next(self.reader)
self.columns_grouping = collections.defaultdict(list)
for index, col_name in enumerate(self.header):
self.columns_grouping[col_name].append(index)
def __iter__(self):
return self
def __next__(self):
row = next(self.reader)
row_dict = dict()
for col_name, col_indices in self.columns_grouping.items():
if len(col_indices) == 1:
row_dict[col_name] = row[col_indices[0]]
else:
row_dict[col_name] = tuple(row[index] for index in col_indices)
return row_dict
Running this with your file A gives:
import io
csv_str = """File Name,Column2[en],Column2[us],isPartOf,isPartOf
file1.tif,English,US English,USA,North America
file2.tif,English,,USA,
file3.tif,,US,,North America"""
reader = DuplicateColumnDictReader(io.StringIO(csv_str), delimiter=",")
for row in reader:
print(row["isPartOf"])
Which will print:
('USA', 'North America')
('USA', '')
('', 'North America')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.