简体   繁体   中英

Python csv reader: how to pipe output to another script using command line

I have 2 scripts, a mapper and a reducer. Both are taking input from the csv reader. The mapper script should take its input from a tab-delimited text file, dataset.csv, the input to the reducer should be the output to the mapper. I want to save the output of the reducer to a text file, output.txt. What is the correct chain of commands to do it?

mapper:

#/usr/bin/python

import sys, csv

reader = csv.reader(sys.stdin, delimiter='\t')
writer = csv.writer(sys.stdout, delimiter='\t', quotechar='"', quoting=csv.QUOTE_ALL)

for line in reader:
if len(line) > 5: # parse only lines in the forum_node.tsv file
    if line[5] == 'question':
        _id = line[0]
        student = line[3] # author_id
    elif line[5] != 'node_type':
        _id = line[7]
        student = line[3] # author_id
    else:
        continue # ignore header

    print '{0}\t{1}'.format(_id, student)

reducer:

#/usr/bin/python

import sys, csv

reader = csv.reader(sys.stdin, delimiter='\t')
writer = csv.writer(sys.stdout, delimiter='\t', quotechar='"', quoting=csv.QUOTE_ALL)

oldID = None
students = []

for line in reader:
if len(line) != 2:
    continue

thisID, thisStudent = data

if oldID and oldID != thisID:
    print 'Thread: {0}, students: {1}'.format(oldID, ', '.join(students))
    students = []

thisID = oldID
students.append(thisStudent)

if oldID != None:
print 'Thread: {0}, students: {1}'.format(oldID, ', '.join(students))

Pipe the files together:

python mapper.py < dataset.csv | python reducer.py > output.txt

The < dataset.csv gives mapper.py the CSV file on stdin , and the | redirects the stdout to another commend. That other command is python reducer.py , and > output.txt connects the stdout from that script to `output.txt.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM