将pandas dataframe传递给python subprocess.Popen作为参数

Question

I am attempting to call a python script from a master script. 我试图从主脚本调用python脚本。 I need the dataframe to be generated only one from within the master script and then passed on to the subprocess script as an argument to be used inside the subprocess. 我需要从主脚本中仅生成一个数据帧，然后将其作为要在子进程内使用的参数传递给子进程脚本。

Following is my attempt at writing the required python master script. 以下是我尝试编写所需的python主脚本。

from subprocess import PIPE, Popen
import pandas as pd

test_dataframe = pd.read_excel(r'C:\test_location\file.xlsx',sheetname='Table')

sp = Popen(["python.exe",'C:/capture/test.py'], shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE)
sp.communicate(test_dataframe)

And here is the error: TypeError: argument 1 must be convertible to a buffer, not DataFrame 这是错误： TypeError: argument 1 must be convertible to a buffer, not DataFrame

This is my first time trying to use the subprocess module so i am not very good at it yet. 这是我第一次尝试使用子进程模块，所以我还不是很擅长。 Any help will be much appreciated. 任何帮助都感激不尽。

Answer 1

Subprocess launches another application. Subprocess启动另一个应用程序。 The ways that processes may communicate between each other significantly differ from ways that functions communicate within python program. 进程之间可以相互通信的方式与python程序中的函数通信方式有很大不同。 You need to pass your DataFrame through a non pythonic environment. 您需要通过非pythonic环境传递DataFrame。 So you need to serialize it in-to a text and then deserialize it on other end. 因此，您需要将其序列化为文本，然后在另一端反序列化。 For example you can use pickle module and then sp.communicate(pickle.dumps(test_dataframe)) on one end end pickle.loads(sys.stdin.read()) on another. 例如，您可以使用pickle模块，然后在另一端的pickle.loads(sys.stdin.read())上使用sp.communicate(pickle.dumps(test_dataframe)) 。 Or you can write your DataFrame as csv and then parse it again. 或者您可以将您的DataFrame编写为csv，然后再次解析它。 Or you can use any other format. 或者您可以使用任何其他格式。

Answer 2

Here is a complete example for Python 3.6 of two-way communication between the master script and a subprocess. 下面是主要脚本和子进程之间双向通信的Python 3.6的完整示例。

master.py master.py

import pandas as pd
import pickle
import subprocess

df = pd.read_excel(r'C:\test_location\file.xlsx',sheetname='Table')

result = subprocess.run(['python', 'call_model.py'], input=pickle.dumps(df), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
returned_df = pickle.loads(result.stdout)
assert df == returned_df

If there is a problem, you can check result.stderr . 如果有问题，您可以检查result.stderr 。

subroutine.py subroutine.py

import pickle
import sys

data = pickle.loads(sys.stdin.buffer.read())
sys.stdout.buffer.write(pickle.dumps(data))

将pandas dataframe传递给python subprocess.Popen作为参数

问题描述

2 个解决方案

解决方案1
3 2017-08-03 17:30:07

解决方案2
2 2018-10-16 18:02:58

将pandas dataframe传递给python subprocess.Popen作为参数

问题描述

2 个解决方案

解决方案1 3 2017-08-03 17:30:07

解决方案2 2 2018-10-16 18:02:58

解决方案1
3 2017-08-03 17:30:07

解决方案2
2 2018-10-16 18:02:58