简体   繁体   English

两次读取读取对象会导致“ IndexError:列表索引超出范围”

[英]Reading cread object twice causes 'IndexError: list index out of range'

When reading a csv reader object twice, getting the error 'IndexError: list index out of range'. 两次读取csv阅读器对象时,出现错误“ IndexError:列表索引超出范围”。 Right now I'm creating a dictionary from iterating over the object, but fail when trying to create a similar list. 现在,我正在通过迭代对象来创建字典,但是在尝试创建类似列表时失败。 Other code blocks are omitted for brevity, here is the pertinent code: 为了简洁起见,其他代码块被省略,以下是相关代码:

# Parse csv files for samples, creating a dictionary of key, value pairs and multiple lists.
with open('genes.csv') as f:
    cread = csv.reader(f, delimiter = '\t')
    sample_1_dict = {i: float(j) for i, j in cread}
    sample_1_list = [x for x in sample_1_dict.items()]
    sample_1_genes_sorted = sorted(sample_1_list, key=lambda expvalues: expvalues[0])
    sample_1_values_sorted = sorted(sample_1_list, key=lambda expvalues: expvalues[1])
    sample_1_genes = [i for i, j in sample_1_values_sorted]
    sample_1_values = [j for i, j in sample_1_values_sorted]
    sample_1_graph_un = [float(j) for i, j in cread]

...

sample_values_list = [i for i in sample_1_graph_un, sample_2_graph_un, sample_3_graph_un, sample_4_graph_un, sample_5_graph_un, sample_6_graph_un]

sample_graph_list_un = [[i for i in sample_value] for sample_value in sample_values_list]

colors = 'bgrcmy'
alphas = ['0.5', '0.5', '0.5', '0.5', '0.5', '0.5']
labels = ['278', '470', '543', '5934', '6102', '17163']

for graph, color, alpha, label in zip(sample_graph_list_un, colors, alphas, labels):
    plt.hist(graph, bins = 21, histtype = 'stepfilled', normed = True, color = color, alpha = float(alpha), label=label)

I'm resorting to reopening the csv file, and the following code does work: 我要重新打开csv文件,并且以下代码可以正常工作:

# Parse csv files for samples, creating a dictionary of key, value pairs and multiple lists.
with open('genes.csv') as f:
    cread = csv.reader(f, delimiter = '\t')
    sample_1_dict = {i: float(j) for i, j in cread}
    sample_1_list = [x for x in sample_1_dict.items()]
    sample_1_genes_sorted = sorted(sample_1_list, key=lambda expvalues: expvalues[0])
    sample_1_values_sorted = sorted(sample_1_list, key=lambda expvalues: expvalues[1])
    sample_1_genes = [i for i, j in sample_1_values_sorted]
    sample_1_values = [j for i, j in sample_1_values_sorted]

...

with open('genes.csv') as f:
    cread = csv.reader(f, delimiter = '\t')
    sample_1_graph_un = [float(j) for i, j in cread]

sample_values_list = [i for i in sample_1_graph_un, sample_2_graph_un, sample_3_graph_un, sample_4_graph_un, sample_5_graph_un, sample_6_graph_un]

sample_graph_list_un = [[i for i in sample_value] for sample_value in sample_values_list]
colors = 'bgrcmy'
alphas = ['0.5', '0.5', '0.5', '0.5', '0.5', '0.5']
labels = ['278', '470', '543', '5934', '6102', '17163']

for graph, color, alpha, label in zip(sample_graph_list_un, colors, alphas, labels):
    plt.hist(graph, bins = 21, histtype = 'stepfilled', normed = True, color = color, alpha = float(alpha), label=label)

The difference in each code example is the location of the below statement in either one of the two 'with' blocks: 每个代码示例的区别在于以下语句在两个“ with”块之一中的位置:

sample_1_graph_un = [float(j) for i, j in cread] 

You cannot read from a file, or a csv.reader() object twice without reopening or rewinding to the start. 您不能从文件或csv.reader()对象读取两次,而无需重新打开或倒回开始。

Files are like a tape; 文件就像磁带一样。 as you read the file positions advances until it reaches the end. 当您阅读文件时,位置会一直前进直到到达终点。 After that more attempts to read from them only results in no data being returned. 之后,尝试从中读取更多内容只会导致不返回任何数据。

To rewind a file, use the .seek() method: 要倒带文件,请使用.seek()方法:

f.seek(0)

Note that your code seems to do an awful lot of extra work that is entirely not needed. 请注意,您的代码似乎完成了很多不必要的额外工作。 [i for i in ...] merely loops over the input sequence building a copy of the sequence, where no actual copies are needed. [i for i in ...]仅循环遍历输入序列,以构建该序列的副本,而无需实际副本。

In fact, you don't need to read anything twice, the code can be simplified to: 实际上,您不需要阅读任何内容,可以将代码简化为:

sample_graph_list_un = []

with open('genes.csv') as f:
    cread = csv.reader(f, delimiter = '\t')
    key_values = [(i, float(j)) for i, j in cread]
    sample_genes = sorted(k for k, v in key_values)
    sample_values = [v for k, v in key_values]  # unsorted for appending first
    sample_graph_list_un.append(sample_values)
    sample_values = sorted(sample_values)       # sorted() creates a copy

Note how the code addends to the sample_graph_list_un list; 注意代码如何添加到sample_graph_list_un列表中; there is absolutely no need for you to build 6 separately named lists from separate csv files and then later combine them into one list here. 绝对不需要从单独的csv文件构建6个单独命名的列表,然后在此处将它们合并为一个列表。

I didn't see how you used the sorted _genes and _values lists, I included them in the code but didn't append them anywhere. 我没有看到您如何使用排序后的_genes_values列表,我将它们包含在代码中,但是没有在任何地方附加它们。 Use them in a similar vein, or completely remove the lines with sorted() if you do not need these lists anywhere. 以类似的方式使用它们,或者,如果您不需要在任何地方使用这些列表,请使用sorted()完全删除这些行。

There's a very simple, and very general, solution to this. 有一个非常简单且通用的解决方案。

Any time you have an iterator (a file, a CSV reader, a generator, whatever) that you want to iterate over multiple times, you can just toss it in a list : 每当您有一个要迭代多次的迭代器(文件,CSV阅读器,生成器,等等)时,都可以将其放入一个list

with open('genes.csv') as f:
    cread = list(csv.reader(f, delimiter = '\t'))

Then the rest of your code can remain unchanged (or you can pull it outside of the with statement): 然后,其余代码可以保持不变(或者可以将其拉到with语句之外):

sample_1_dict = {i: float(j) for i, j in cread}
sample_1_list = [x for x in sample_1_dict.items()]
sample_1_genes_sorted = sorted(sample_1_list, key=lambda expvalues: expvalues[0])
sample_1_values_sorted = sorted(sample_1_list, key=lambda expvalues: expvalues[1])
sample_1_genes = [i for i, j in sample_1_values_sorted]
sample_1_values = [j for i, j in sample_1_values_sorted]
sample_1_graph_un = [float(j) for i, j in cread]

The downside of doing this is that you have to build an otherwise-unnecessary list, and you can't start your processing until you've read the whole file. 这样做的缺点是,您必须构建一个不必要的列表,并且在您阅读完整个文件之后才能开始处理。 If you can write your whole algorithm as a sequence of one-pass transformations from one iterator to another (eg, generator expressions), that's a huge win. 如果您可以将整个算法编写为从一个迭代器到另一个迭代器(例如,生成器表达式)的一遍转换序列,那将是一个巨大的胜利。

But in your case, you're already building up a number of lists and dicts, and you can't get to the second one until you've read the whole file, so there's really no cost to building the list at the start. 但是对于您而言,您已经建立了许多列表和字典,在阅读完整个文件之前,您无法进入第二个列表和字典,因此从一开始就构建列表确实没有成本。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM