[英]piping from shell into jupyter notebook cell
Does anyone know how to stream the output of a shell command (a chain of csvkit tool invocations) into a jupyter notebook cell, but specifically into a Pandas DataFrame. 有谁知道如何将shell命令(一连串的csvkit工具调用)的输出流传输到jupyter笔记本单元中,特别是流到Pandas DataFrame中。 From the cell's content it would look something like this: 从单元格的内容看起来像这样:
output = !find /path -name "*.csv" | csvstack ... | csvgrep ...
df = DataFrame.read_csv(output)
only the above isn't really work. 仅上述内容是行不通的。 The output of the shell is very large millions of rows, which Pandas can handle just fine, but I don't want the output to be loaded into memory in its entirety as a string. Shell的输出非常大,有数百万行,Pandas可以很好地处理它,但是我不希望将输出整体作为字符串加载到内存中。
I'm looking for a piping/streaming solution that allows Pandas to read the output as it comes. 我正在寻找一种管道/流媒体解决方案,该解决方案允许Pandas读取输出信息。
I figured out a workaround. 我想出了一种解决方法。 Though not actually piping, but it saves some disk I/O expense: 虽然实际上不是管道,但它节省了一些磁盘I / O费用:
import io
import pandas as pd
output = !(your Unix command)
df = pd.read_table(io.StringIO(output.n))
IIUC you can do it by letting pandas read from STDIN: IIUC您可以通过让熊猫从STDIN读取来做到这一点:
Python script: Python脚本:
import sys
import pandas as pd
df = pd.read_csv(sys.stdin)
print(df)
Shell command line: Shell命令行:
!find /path -name "*.csv" | csvstack ... | csvgrep ... | python our_pyscript.py
please pay attention at the last part: | python our_pyscript.py
请在最后一部分注意: | python our_pyscript.py
| python our_pyscript.py
Perhaps "named pipes" would be useful in your case. 在您的情况下,“命名管道”可能会很有用。
In shell: 在外壳中:
mkfifo MYFIFO
head myfile.txt > MYFIFO
In notebook: 在笔记本中:
with open('MYFIFO', 'rt') as f:
print(f.readline())
A few good internet searches should give you the information you need to use named pipes safely and effectively. 一些良好的互联网搜索应会为您提供安全有效地使用命名管道所需的信息。 Good luck! 祝好运!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.