简体   繁体   English

如何读取具有多个具有相同或相似名称的列的 CSV 文件?

[英]How to read CSV files having multiple columns with same or similar names?

I am given a CSV that has two issues that is provided by a third party, out of my control to.我收到了一个 CSV,它有两个问题,由第三方提供,我无法控制。

  1. Columns with Similar Names具有相似名称的列
  2. Columns with the Same Name具有相同名称的列

Different CSV will have different Similar Names不同的 CSV 会有不同的 Similar Names

CSV File A CSV 文件 A

File Name,Column2[en],Column2[us],isPartOf,isPartOf
file1.tif,English,US English,USA,North America
file2.tif,English,,USA,
file3.tif,,US,,North America

CSV File B CSV 文件 B

File Name,Column2[fr],Column2[en],isPartOf,isPartOf

Is it possible using csv.DictReader to use startswith() to read multiple columns?是否可以使用csv.DictReader使用startswith()来读取多列? Or do I need to create a map of the header row and map them separately before reading the CSV with DictReader ?或者我是否需要在用Dictionary读取ZCC8D68C351C4ADEAFD6D3之前分别创建header行和map的DictReader

Is it possible to read both to load the data from both columns with the same name?是否可以读取两者以从具有相同名称的两个列中加载数据? I know you can do something with dataframes in pandas, but I am not allowed to use Pandas.我知道你可以对 pandas 中的数据帧做一些事情,但我不允许使用 Pandas。

#!/bin/env python3

import csv

with open("./test.csv") as csv_file:
        csv_reader = csv.DictReader(csv_file, delimiter=',')
        for row in csv_reader:
                print(row["isPartOf"],row["isPartOf"])

I run this using:我使用以下方法运行它:

$ ./csvReader.py 
North America North America
North America

You could create a class which uses csv.reader to read the first line, use its column names to figure out how to handle duplicate columns, and then yield rows as dictionaries when iterated over.您可以创建一个 class ,它使用csv.reader读取第一行,使用其列名来确定如何处理重复列,然后在迭代时将行作为字典。 This example groups all columns by name, and if multiple columns have the same name, returns a tuple containing all the column values in the dictionary此示例按名称对所有列进行分组,如果多个列具有相同名称,则返回一个包含字典中所有列值的元组

import csv
import collections

class DuplicateColumnDictReader:
    def __init__(self, iterable, dialect='excel', **kwargs):
        self.reader = csv.reader(iterable, dialect, **kwargs)
        self.header = next(self.reader)
        self.columns_grouping = collections.defaultdict(list)
        
        for index, col_name in enumerate(self.header):
            self.columns_grouping[col_name].append(index)
            
    def __iter__(self):
        return self
    
    def __next__(self):
        row = next(self.reader)
        row_dict = dict()
        for col_name, col_indices in self.columns_grouping.items():
            if len(col_indices) == 1:
                row_dict[col_name] = row[col_indices[0]]
            else:
                row_dict[col_name] = tuple(row[index] for index in col_indices)
        return row_dict

Running this with your file A gives:用你的文件 A 运行它会给出:

import io

csv_str = """File Name,Column2[en],Column2[us],isPartOf,isPartOf
file1.tif,English,US English,USA,North America
file2.tif,English,,USA,
file3.tif,,US,,North America"""

reader = DuplicateColumnDictReader(io.StringIO(csv_str), delimiter=",")
for row in reader:
    print(row["isPartOf"])

Which will print:这将打印:

('USA', 'North America')
('USA', '')
('', 'North America')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在 csv 文件中读取 Python 中的列作为变量名? - How can I read in csv files with columns as variable names in Python? 如何读取在同一列中具有多个json值的数据框 - how to read data frame having multiple json value in same columns 如何组合具有相似名称的多个列? - how to combine multiple columns with similar names? 如何将多个原始输入 CSV 与包含名称略有不同的相似列的 pandas 合并? - How to merge multiple raw input CSV's with pandas containing similar columns with slightly different names? 如何读取名称相似但不写所有名称的文件? 蟒蛇 - How to read files with similar names but without writting all the names? python 如何使用 dask 从同一目录中读取多个 .csv 文件? - How read multiple .csv files from the same directory using dask? 如何在pandas中同时读取多个csv文件 - How to read multiple csv files at the same time in pandas 自动读取名称相似的文件 - Automatic read files with similar names 如何根据列名将多个 csv 文件连接成一个文件,而无需在代码中键入每个列标题 - How to concatenate multiple csv files into one based on column names without having to type every column header in code Python - 如何加入共享相似数据但在附加列中的多个 csv 文件? - Python - How to join multiple csv files sharing similar data, but in additional columns?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM