简体   繁体   中英

How to get first N lines from sys.stdin line by line on Python

I faced an issue while I'm writing a reducer for MapReduce. I want to get first 10 lines of very large file and I used for loop and break. But, a break command is firing an error on hadoop, so I'm looking for an alternative way:

for line in fileinput.input():
    if(counter>limit):
        break

    line = line.strip()
    print (line)
    counter +=1

Error log:

Error: java.io.IOException: subprocess exited successfully
R/W/S=6936/19/0 in:NA [rec/s] out:NA [rec/s]
minRecWrittenToEnableSkip_=9223372036854775807 HOST=null
USER=s2132211
HADOOP_USER=null
last tool output: |29670    YOU HAVE AATO|
Broken pipe
    at org.apache.hadoop.streaming.PipeReducer.reduce(PipeReducer.java:129)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

First, either your example is not formatted good, or You have a logical error. print(line) and counter += 1 should be INSIDE for loop.

Easier way to write this down is:

for counter, line in enumerate(fileinput.input()):
    if(counter>limit):
        break

    line = line.strip()
    print (line)

Now, if this doesn't fix the issue, few questions.

1) Can You see any output from the program (is it actually printing something from that for loop)?

2) Does the program crashes immediately, or after some period of time?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM