I want to read first 100 lines from stdin, convert it into a dataframe, do some processing with it. Then read the next 100 lines(101-200) from stdin, convert it into a dataframe, do some processing...and so forth
readlines() in python doesn't have any argument to specify the number of lines to be read.
readLines() in R has this but I am not able to do the same in python.
Appreciate any help in this.
Try using sys.stdin
. It has a file interface, true to the unix philosophy. This means that you can iterate on it to get lines. After that, you just have to slice it like any iterator -- I'd suggest itertools https://docs.python.org/2/library/itertools.html .
import sys
import itertools
CHUNK_LENGTH = 200
lines_chunk = itertools.islice(sys.stdin, CHUNK_LENGTH)
Better yet, use the itertools recipe for grouper and get an iterable of chunks (see the above link)
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue, *args)
chunks_of_200 = grouper(sys.stdin, CHUNK_LENGTH, fillvalue="")
for chunk_of_200 in chunks_of_200:
# do something with chunk
If you want vanilla Python 3, you could do
import sys
lines = [line for _,line in zip(range(200),sys.stdin)]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.