简体   繁体   English

csv.reader只读一行

[英]csv.reader only reading in one line

I am pretty new to python. 我对python很陌生。 I am trying to process data on a very large .csv file (~6.8 million lines). 我正在尝试在非常大的.csv文件(约680万行)上处理数据。 An example of the lines would look like: 这些行的示例如下所示:

Group1.1    57645   0.0954454545 
Group1.1    57662   0.09556544778
Group1.13   500 0.357114538 
Group1.13   504 0.320618298 
Group1.13   2370    0.483851368 
Group1.14   42  0.5495688

The first column gives the group, the second gives the position and the third gives the value I am reading in to run a calculation on. 第一列给出了组,第二列给出了位置,第三列给出了我正在读取以进行计算的值。 I am trying to perform these calculations in a "sliding window" based on the position. 我正在尝试根据位置在“滑动窗口”中执行这些计算。 Another factor is that each group is calculated separately from one another because the position number restarts for each group. 另一个因素是,每个组是相互独立计算的,因为每个组的位置编号都会重新启动。 In my code I am first trying to read in the group ID's as a list before I do anything, "uniqifying" that list, and then using that list as a basis for only performing the "sliding window" over that specific group. 在我的代码中,我首先尝试在执行任何操作之前先读取组ID,然后“统一”该列表,然后以该列表为基础,仅对该特定组执行“滑动窗口”。 I then move to the next group ID in the unique list and run the calculation again. 然后,我移至唯一列表中的下一个组ID,然后再次运行计算。 Here is the basics of my code (the unique1 function is a simple method to uniqify a list: 这是我的代码的基础(unique1函数是用于唯一化列表的一种简单方法:

for row in reader:
    scaffolds.append(row[0])
    unique1(scaffolds)
    newfile.seek(0)
    reader=csv.reader((line.replace('\0','') for line in newfile), delimiter="\t")
    if row[0] == unique_scaffolds[i]:
        #...perform the calculations
    else:
        i+=1

My problem that I am running into is that it is only reading in the very first line of my data set and nothing more. 我遇到的问题是它仅读取数据集的第一行,仅此而已。 So if I insert a "print row" right after the "for row in reader", I get an output like this: 因此,如果我在“读者中的行”之后插入“打印行”,则会得到如下输出:

['Group1.1', '424', '0.082048032']

If I write this exact same code without any of the further calculations and loops following, it will print every single row in the data set. 如果我编写了完全相同的代码而没有进行任何进一步的计算和循环,它将打印数据集中的每一行。 In this situation how would I read in every line at the beginning of this loop? 在这种情况下,在循环开始时我该如何读每一行?

Thanks in advance for any suggestions or input. 在此先感谢您的任何建议或意见。 If I am not being clear enough, let me know and I can try to explain further. 如果我不够清楚,请告诉我,我可以尝试进一步解释。 Thanks! 谢谢!

You are re-initializing reader each time. 您每次都在重新初始化阅读器。 Essentially this is causing it to get stuck on the first line. 本质上,这导致它卡在第一行。 Try this 尝试这个

reader=csv.reader((line.replace('\0','') for line in newfile), delimiter="\t")
for row in reader:
    scaffolds.append(row[0])
    unique1(scaffolds)
    newfile.seek(0)

    if row[0] == unique_scaffolds[i]:
        #...perform the calculations
    else:
        i+=1

It looks to me like you're replacing your reader object inside the loop. 在我看来,您好像要在循环内替换读者对象。 Fix that (or get rid of it) and you'll probably have better luck getting this to work. 修复该问题(或摆脱它),您可能会有更好的运气来使它起作用。

Realize that cvsreader will only read one line in at a time. 意识到cvsreader一次只能读取一行。 You will have to generate your own list by reading them in, one line at a time. 您必须通过一次读入一行来生成自己的列表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM