简体   繁体   中英

piping from shell into jupyter notebook cell

Does anyone know how to stream the output of a shell command (a chain of csvkit tool invocations) into a jupyter notebook cell, but specifically into a Pandas DataFrame. From the cell's content it would look something like this:

 output = !find /path -name "*.csv" | csvstack ... | csvgrep ... 
 df = DataFrame.read_csv(output)

only the above isn't really work. The output of the shell is very large millions of rows, which Pandas can handle just fine, but I don't want the output to be loaded into memory in its entirety as a string.

I'm looking for a piping/streaming solution that allows Pandas to read the output as it comes.

I figured out a workaround. Though not actually piping, but it saves some disk I/O expense:

import io
import pandas as pd
output = !(your Unix command)
df = pd.read_table(io.StringIO(output.n))

IIUC you can do it by letting pandas read from STDIN:

Python script:

import sys
import pandas as pd
df = pd.read_csv(sys.stdin)
print(df)

Shell command line:

!find /path -name "*.csv" | csvstack ... | csvgrep ... | python our_pyscript.py

please pay attention at the last part: | python our_pyscript.py | python our_pyscript.py

You may also want to check this

Perhaps "named pipes" would be useful in your case.

In shell:

mkfifo MYFIFO
head myfile.txt > MYFIFO

In notebook:

with open('MYFIFO', 'rt') as f:
    print(f.readline())

A few good internet searches should give you the information you need to use named pipes safely and effectively. Good luck!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM