简体   繁体   English

Python:如何根据对 CSV 文件的列标题的查找来设置 class 的变量

[英]Python: How to set the variables of a class, based on a lookup to the column headers of a CSV file

I have a class ETF that has many variables.我有一个 class ETF ,它有很多变量。 I just included three below for simplicity but there are actually close to 40:为简单起见,我只在下面列出了三个,但实际上有将近 40 个:

class ETF:
    def __init__(self, symbol, name, asset_class):
        self.symbol = symbol
        self.name = name
        self.asset_class = asset_class

There is another file in my project with the following code.我的项目中还有另一个文件,其中包含以下代码。 The two #CODE NEEDED HERE comments are where my question pertains to.两条#CODE NEEDED HERE评论是我的问题所在。

import csv

# Open the file
data = open('db.csv')
csv_data = csv.reader(data) # csv.reader

# reformat it into a python object list of lists
data_lines = list(csv_data)

headers = data_lines[1] # Retrieving the column headers

# Find the Index positions in headers for each ETF class attribute
#CODE NEEDED HERE

# create ETF objects for each line in the file
for line in data_lines[2:]:
    # CODE NEEDED HERE
    # Lookup the column header based on the

I also have two spreadsheets.我还有两个电子表格。 One spreadsheet is called db.csv and contains the information we will be using to create ETF objects.一个名为db.csv的电子表格包含我们将用于创建ETF对象的信息。 Each row in this csv will be it's own ETF object. The column headers on the CSV file do do not exactly match the variable names in the ETF class and not every column is used.此 csv 中的每一行都将是它自己的ETF object。CSV 文件中的列标题与ETF class 中的变量名称不完全匹配,并且并非每一列都被使用。 For that reason, I have a second spreadsheet called column_reference.csv which I will use to map the column names in db.csv to the ETF variable names.出于这个原因,我有第二个电子表格,名为column_reference.csv ,我将使用它来将 db.csv 中的列名称db.csv用于ETF变量名称。

See table below for an example of the column_reference.csv file:有关column_reference.csv文件的示例,请参见下表:

column_reference.csv

Please see the image below as an example of the db.csv file:请参阅下图作为db.csv文件的示例:

在此处输入图像描述

What code would you use to most efficiently map the column headers and create ETF objects.您将使用什么代码最有效地 map 列标题和创建 ETF 对象。

Use pandas to create a dataframe out of the csv and df.iterrows() to iterate over the rows and initialize objects by them.使用pandas从 csv 和df.iterrows()中创建一个 dataframe 来迭代行并通过它们初始化对象。 By manipulating the df.columns attribute you can set your custom column names.通过操作df.columns属性,您可以设置自定义列名。

This is the "Pythonic way":这是“Pythonic 方式”:

columns = open('column_reference.csv')
csv_columns = csv.reader(columns) 

columns_dict = {}

for column in csv_columns:
    columns_dict[column[0]] = column[1]

for line in data_lines[2:]:
    values = {}
    for key in columns_dict.keys():
        p_index = headers.index(key)
        values[key] = line[p_index]
        ETF(**values)

        

I ended up using a series of nested for loops to create lists of each CSV row to accomplish this in the shortest amount of time possible.我最终使用了一系列嵌套的 for 循环来创建每个 CSV 行的列表,以在尽可能短的时间内完成此操作。 The pandas solution was too time consuming pandas解决太费时间

import csv
from ETF import ETF


# Open the file
data = open('db.csv')
csv_data = csv.reader(data) # csv.reader

# reformat it into a python object list of lists
data_lines = list(csv_data)
print(type(data_lines))


# Creating a hash map of the column_reference.csv file
name_map = []
with open('column_reference.csv') as f:
    for line in f:
        tokens = line.split(',')
        old = tokens[0]
        new = tokens[1]
        name_map.append([old, new])

# Retrieving the column headers of the database file
counter = -1
for i in data_lines[1]:
    counter = counter + 1
    for j in name_map:
        if j[0] == i:
            j.append(counter)

# Creating ETF objects based on the indexes of the columns in the database
for line in data_lines[2:]:
    # Lookup the column header based on the
    etf_characteristics = []
    for i in name_map:
        etf_characteristics.append(line[i[2]])
    this_etf = ETF(*etf_characteristics)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在python中根据月、年、时间列标题编写csv文件名 - How to write csv file name based on month, year, time column headers in python 在Python中读取csv文件时如何定义列标题 - How to define column headers when reading a csv file in Python 在Python中,读取带有数据列的csv文件,并根据列标题分配变量 - In Python, read in csv file with columns of data and assign variables according to the column headers 如何创建基于 CSV 的没有标题的 json 文件 - How to create a json file based on CSV with no headers 如何将.txt文件(CSV)中的列设置为变量? (蟒蛇) - How does one set columns in a .txt file (CSV) to variables? (PYTHON) For 循环根据 csv python 中的列名创建标题 - For loop to create headers based on column names in csv python 从CSV导入在Python SQLAlchemy中映射类列标题 - Mapping Class Column Headers in Python SQLAlchemy from CSV import 如何使用 Python 根据 CSV 文件中的列中的条件插入值? - How to insert values based on condition in column in the CSV file using Python? 如何根据第一列合并两个 csv 文件(无标题,无 PANDAS) - How to merge two csv files based on first column (NO HEADERS, NO PANDAS) Python:将具有行标题的文本文件读取到新的CSV / Excel中 - Python: Read text file with column headers in rows into new CSV/Excel
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM