简体   繁体   中英

How to read fixed chunk of lines (say 100) from stdin in python?

I want to read first 100 lines from stdin, convert it into a dataframe, do some processing with it. Then read the next 100 lines(101-200) from stdin, convert it into a dataframe, do some processing...and so forth

readlines() in python doesn't have any argument to specify the number of lines to be read.

readLines() in R has this but I am not able to do the same in python.

Appreciate any help in this.

Try using sys.stdin . It has a file interface, true to the unix philosophy. This means that you can iterate on it to get lines. After that, you just have to slice it like any iterator -- I'd suggest itertools https://docs.python.org/2/library/itertools.html .

import sys
import itertools

CHUNK_LENGTH = 200

lines_chunk = itertools.islice(sys.stdin, CHUNK_LENGTH)

Better yet, use the itertools recipe for grouper and get an iterable of chunks (see the above link)

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args)

chunks_of_200 = grouper(sys.stdin, CHUNK_LENGTH, fillvalue="")
for chunk_of_200 in chunks_of_200:
     # do something with chunk

If you want vanilla Python 3, you could do

import sys
lines = [line for _,line in zip(range(200),sys.stdin)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM