[英]Access the output of python script in R executed through system command
My current question is follow up question to the link below. 我当前的问题是下面链接的后续问题。
Not able to import pandas in R 无法在R中导入熊猫
I have executed the python code in R with system command. 我已经用系统命令在R中执行了python代码。 Now at the end of python script I want to access the Dataframe created in R. One way is to save the Dataframe created in python with df.to_csv and then import it in R. But I am wondering any efficient way to directly access the output in R. 现在,在python脚本的结尾,我想访问R中创建的数据框。一种方法是使用df.to_csv保存在python中创建的数据框,然后将其导入R。但是我想知道是否有一种有效的方法可以直接访问输出在R中
x=system("/Users/ravinderbhatia/anaconda/bin/python /Users/ravinderbhatia/Downloads/Untitled3.py EMEA regulatory '10% productivity saves SOW'")
output dataframe is: 输出数据帧为:
description status region
10 10% productivity saves SOW pending EMEA
16 10% productivity saves SOW approved EMEA
X just contains 0/1(status). X仅包含0/1(状态)。 As mentioned above, how to access Dataframe directly in R without saving it. 如上所述,如何直接在R中访问Dataframe而不保存它。
Python script used is:
import pandas as pd
import numpy as np
import sys
from difflib import SequenceMatcher
def similar(a, b):
return SequenceMatcher(None, a, b).ratio()
arg1 = sys.argv[1]
arg2 = sys.argv[2]
arg3 = sys.argv[3]
print (arg1)
print (arg2)
print (arg3)
def get_similar_CRs(arg1, arg2,arg3):
##create dummy data
cr_id=range(1,41)
description=['change in design','More robust system required',
'Grant system adminstrator rights',
'grant access to all products',
'Increase the credit limit',
'EDAP Scenario',
'Volume prpductivity for NA 2015',
'5% productivity saves SOW',
'effort reduction',
'reduction of false claims',
'Volume productivity EMEA',
'Volume productivity for NA 2016',
'10% productivity saves SOW',
]
region=['EMEA','Asia Pacific','UK']
business=['card','secured loan','mortgage']
type=['regulatory','system','audit']
status=['pending','approved']
data=pd.DataFrame()
data['description']=np.random.choice(description, 40)
data['cr_id']=cr_id
data['region']=np.random.choice(region,40)
data['business']=np.random.choice(business, 40)
data['status']=np.random.choice(status,40)
data['type']=np.random.choice(type,40)
subset_data=data.loc[data.region == arg1]
print (subset_data.head())
subset_data=subset_data.loc[subset_data.type ==arg2]
##This has to be captured dynamically
new_cr=arg3
cr_list=data['description'].unique().tolist()
similar_CR=[] ###global variable
# for new_cr in new_cr_lis
for cr in cr_list:
result=similar(new_cr,cr)
if result >=0.8:
similar_CR.append(cr)
temp=subset_data.loc[subset_data.description.isin(similar_CR)]
temp=temp[['description','status','region']]
return temp
temp= get_similar_CRs (arg1, arg2, arg3)
print temp
I suggest looking into the reticulate
package (see the online vignette ). 我建议调查reticulate
包装(请参阅在线插图 )。
You can run your file with py_run_file()
and access the python main module with py
. 您可以使用py_run_file()
运行文件,并使用py
访问python主模块。 So lets say your file is called "Untitled3.py" and the data frame it creates is called df
, then 因此,假设您的文件名为“ Untitled3.py”,其创建的数据帧称为df
,然后
library(reticulate)
use_python("/Users/ravinderbhatia/anaconda/bin/python")
py_run_file("Untitled3.py")
py$df
Edit 编辑
Alternatively, you can import only the function from the python file and just call them from inside REg, have the python file as 或者,您可以仅从python文件中导入函数,然后仅在REg内部调用它们,将python文件作为
import pandas as pd
import numpy as np
import sys
from difflib import SequenceMatcher
def similar(a, b):
return SequenceMatcher(None, a, b).ratio()
def get_similar_CRs(arg1, arg2,arg3):
##create dummy data
cr_id=range(1,41)
description=['change in design','More robust system required',
'Grant system adminstrator rights',
'grant access to all products',
'Increase the credit limit',
'EDAP Scenario',
'Volume prpductivity for NA 2015',
'5% productivity saves SOW',
'effort reduction',
'reduction of false claims',
'Volume productivity EMEA',
'Volume productivity for NA 2016',
'10% productivity saves SOW',
]
region=['EMEA','Asia Pacific','UK']
business=['card','secured loan','mortgage']
type=['regulatory','system','audit']
status=['pending','approved']
data=pd.DataFrame()
data['description']=np.random.choice(description, 40)
data['cr_id']=cr_id
data['region']=np.random.choice(region,40)
data['business']=np.random.choice(business, 40)
data['status']=np.random.choice(status,40)
data['type']=np.random.choice(type,40)
subset_data=data.loc[data.region == arg1]
print (subset_data.head())
subset_data=subset_data.loc[subset_data.type ==arg2]
##This has to be captured dynamically
new_cr=arg3
cr_list=data['description'].unique().tolist()
similar_CR=[] ###global variable
# for new_cr in new_cr_lis
for cr in cr_list:
result=similar(new_cr,cr)
if result >=0.8:
similar_CR.append(cr)
temp=subset_data.loc[subset_data.description.isin(similar_CR)]
temp=temp[['description','status','region']]
return temp
and then run 然后运行
library(reticulate)
# To install pandas and numpy in the regular python environment
py_install("pandas", "numpy")
py_run_file("Untitled3.py")
py$get_similar_CRs("EMEA", "regulatory", "10% productivity saves SOW")
#> description status region
#> 2 10% productivity saves SOW pending EMEA
#> 25 10% productivity saves SOW pending EMEA
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.