[英]Python: How to set the variables of a class, based on a lookup to the column headers of a CSV file
I have a class ETF
that has many variables.我有一个 class ETF
,它有很多变量。 I just included three below for simplicity but there are actually close to 40:为简单起见,我只在下面列出了三个,但实际上有将近 40 个:
class ETF:
def __init__(self, symbol, name, asset_class):
self.symbol = symbol
self.name = name
self.asset_class = asset_class
There is another file in my project with the following code.我的项目中还有另一个文件,其中包含以下代码。 The two #CODE NEEDED HERE
comments are where my question pertains to.两条#CODE NEEDED HERE
评论是我的问题所在。
import csv
# Open the file
data = open('db.csv')
csv_data = csv.reader(data) # csv.reader
# reformat it into a python object list of lists
data_lines = list(csv_data)
headers = data_lines[1] # Retrieving the column headers
# Find the Index positions in headers for each ETF class attribute
#CODE NEEDED HERE
# create ETF objects for each line in the file
for line in data_lines[2:]:
# CODE NEEDED HERE
# Lookup the column header based on the
I also have two spreadsheets.我还有两个电子表格。 One spreadsheet is called db.csv
and contains the information we will be using to create ETF
objects.一个名为db.csv
的电子表格包含我们将用于创建ETF
对象的信息。 Each row in this csv will be it's own ETF
object. The column headers on the CSV file do do not exactly match the variable names in the ETF
class and not every column is used.此 csv 中的每一行都将是它自己的ETF
object。CSV 文件中的列标题与ETF
class 中的变量名称不完全匹配,并且并非每一列都被使用。 For that reason, I have a second spreadsheet called column_reference.csv
which I will use to map the column names in db.csv
to the ETF
variable names.出于这个原因,我有第二个电子表格,名为column_reference.csv
,我将使用它来将 db.csv 中的列名称db.csv
用于ETF
变量名称。
See table below for an example of the column_reference.csv
file:有关column_reference.csv
文件的示例,请参见下表:
Please see the image below as an example of the db.csv
file:请参阅下图作为db.csv
文件的示例:
What code would you use to most efficiently map the column headers and create ETF objects.您将使用什么代码最有效地 map 列标题和创建 ETF 对象。
Use pandas
to create a dataframe out of the csv and df.iterrows()
to iterate over the rows and initialize objects by them.使用pandas
从 csv 和df.iterrows()
中创建一个 dataframe 来迭代行并通过它们初始化对象。 By manipulating the df.columns
attribute you can set your custom column names.通过操作df.columns
属性,您可以设置自定义列名。
This is the "Pythonic way":这是“Pythonic 方式”:
columns = open('column_reference.csv')
csv_columns = csv.reader(columns)
columns_dict = {}
for column in csv_columns:
columns_dict[column[0]] = column[1]
for line in data_lines[2:]:
values = {}
for key in columns_dict.keys():
p_index = headers.index(key)
values[key] = line[p_index]
ETF(**values)
I ended up using a series of nested for loops to create lists of each CSV row to accomplish this in the shortest amount of time possible.我最终使用了一系列嵌套的 for 循环来创建每个 CSV 行的列表,以在尽可能短的时间内完成此操作。 The pandas
solution was too time consuming pandas
解决太费时间
import csv
from ETF import ETF
# Open the file
data = open('db.csv')
csv_data = csv.reader(data) # csv.reader
# reformat it into a python object list of lists
data_lines = list(csv_data)
print(type(data_lines))
# Creating a hash map of the column_reference.csv file
name_map = []
with open('column_reference.csv') as f:
for line in f:
tokens = line.split(',')
old = tokens[0]
new = tokens[1]
name_map.append([old, new])
# Retrieving the column headers of the database file
counter = -1
for i in data_lines[1]:
counter = counter + 1
for j in name_map:
if j[0] == i:
j.append(counter)
# Creating ETF objects based on the indexes of the columns in the database
for line in data_lines[2:]:
# Lookup the column header based on the
etf_characteristics = []
for i in name_map:
etf_characteristics.append(line[i[2]])
this_etf = ETF(*etf_characteristics)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.