简体   繁体   English

如何使用 python 仅读取 a.csv 中特定范围的行?

[英]How can I use python to read only a certain range of lines in a .csv?

I am trying to process a huge.csv file but I don't need the first ~900000 rows of data.我正在尝试处理一个巨大的.csv 文件,但我不需要前 ~900000 行数据。 This is how I was originally trying to get rid of that chunk of data, but it makes the program take forever to finish.这就是我最初试图摆脱那块数据的方式,但它使程序需要很长时间才能完成。 Is there a more straightforward way to do this where I don't even read those first 900000 rows in the first place?有没有更直接的方法可以做到这一点,我什至一开始都没有阅读前 900000 行?

firstColumn = [ ]
secondColumn = [ ]
thirdColumn = [ ]

readFile  = input("Enter name of file to be read: ")

with open(readFile,'r') as readFile:

    for eachline in readFile:                               # converting columns to lists
        parts = eachline.strip('\n').split(',')
        firstColumn.append(parts[0])
        secondColumn.append(parts[1])
        thirdColumn.append(parts[2])    
    
for j in range(900000):                          # nothing happens for these datapoints
    del firstColumn[j]
    del secondColumn[j]
    del thirdColumn[j]

You can skip the initial lines by doing something like this:您可以通过执行以下操作跳过初始行:

with open(readFile, 'r') as f:
    # skip first 900,000 lines
    for _ in range(900000):
        next(f)
    for line in f:
        parts = line.strip('\n').split(',')
        firstColumn.append(parts[0])
        secondColumn.append(parts[1])
        thirdColumn.append(parts[2])

You're right;你是对的; that's awful .太可怕了。 It's silly to convert 900K lines of input that you don't intend to use.转换您不打算使用的 900K 行输入是愚蠢的。 Instead, skip past them entirely:相反,完全跳过它们:

# read past first 900K lines
with open(readFile,'r') as readFile:
    for _ in range(900000):
        readFile.readline()

    for eachline in readFile:                               # converting columns to lists
        # Continue as before

With that done, I strongly recommend that you switch to a csv reader to grab the rest of the file;完成后,我强烈建议您切换到 csv 阅读器以获取文件的 rest; you can build your data frame in on simple operation from there.您可以从那里通过简单的操作构建您的数据框。 Be careful that you do not close and reopen the file, or otherwise reset the file header info.请注意不要关闭并重新打开文件,或以其他方式重置文件 header 信息。

You could use pandas which has a way to make a copy.csv file with those rows eliminated in the copy.您可以使用 pandas 可以制作副本。csv 文件在副本中删除了这些行。 First make a dataframe of your file, then you can use.iloc[] to put the row index in that you want to start from.首先制作文件的 dataframe,然后您可以使用 .iloc[] 将行索引放入您想要开始的位置。 This would be after the rows you want to cut.这将在您要剪切的行之后。 Parse it like a string.像字符串一样解析它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM