My current question is follow up question to the link below.
Not able to import pandas in R
I have executed the python code in R with system command. Now at the end of python script I want to access the Dataframe created in R. One way is to save the Dataframe created in python with df.to_csv and then import it in R. But I am wondering any efficient way to directly access the output in R.
x=system("/Users/ravinderbhatia/anaconda/bin/python /Users/ravinderbhatia/Downloads/Untitled3.py EMEA regulatory '10% productivity saves SOW'")
output dataframe is:
description status region
10 10% productivity saves SOW pending EMEA
16 10% productivity saves SOW approved EMEA
X just contains 0/1(status). As mentioned above, how to access Dataframe directly in R without saving it.
Python script used is:
import pandas as pd
import numpy as np
import sys
from difflib import SequenceMatcher
def similar(a, b):
return SequenceMatcher(None, a, b).ratio()
arg1 = sys.argv[1]
arg2 = sys.argv[2]
arg3 = sys.argv[3]
print (arg1)
print (arg2)
print (arg3)
def get_similar_CRs(arg1, arg2,arg3):
##create dummy data
cr_id=range(1,41)
description=['change in design','More robust system required',
'Grant system adminstrator rights',
'grant access to all products',
'Increase the credit limit',
'EDAP Scenario',
'Volume prpductivity for NA 2015',
'5% productivity saves SOW',
'effort reduction',
'reduction of false claims',
'Volume productivity EMEA',
'Volume productivity for NA 2016',
'10% productivity saves SOW',
]
region=['EMEA','Asia Pacific','UK']
business=['card','secured loan','mortgage']
type=['regulatory','system','audit']
status=['pending','approved']
data=pd.DataFrame()
data['description']=np.random.choice(description, 40)
data['cr_id']=cr_id
data['region']=np.random.choice(region,40)
data['business']=np.random.choice(business, 40)
data['status']=np.random.choice(status,40)
data['type']=np.random.choice(type,40)
subset_data=data.loc[data.region == arg1]
print (subset_data.head())
subset_data=subset_data.loc[subset_data.type ==arg2]
##This has to be captured dynamically
new_cr=arg3
cr_list=data['description'].unique().tolist()
similar_CR=[] ###global variable
# for new_cr in new_cr_lis
for cr in cr_list:
result=similar(new_cr,cr)
if result >=0.8:
similar_CR.append(cr)
temp=subset_data.loc[subset_data.description.isin(similar_CR)]
temp=temp[['description','status','region']]
return temp
temp= get_similar_CRs (arg1, arg2, arg3)
print temp
I suggest looking into the reticulate
package (see the online vignette ).
You can run your file with py_run_file()
and access the python main module with py
. So lets say your file is called "Untitled3.py" and the data frame it creates is called df
, then
library(reticulate)
use_python("/Users/ravinderbhatia/anaconda/bin/python")
py_run_file("Untitled3.py")
py$df
Edit
Alternatively, you can import only the function from the python file and just call them from inside REg, have the python file as
import pandas as pd
import numpy as np
import sys
from difflib import SequenceMatcher
def similar(a, b):
return SequenceMatcher(None, a, b).ratio()
def get_similar_CRs(arg1, arg2,arg3):
##create dummy data
cr_id=range(1,41)
description=['change in design','More robust system required',
'Grant system adminstrator rights',
'grant access to all products',
'Increase the credit limit',
'EDAP Scenario',
'Volume prpductivity for NA 2015',
'5% productivity saves SOW',
'effort reduction',
'reduction of false claims',
'Volume productivity EMEA',
'Volume productivity for NA 2016',
'10% productivity saves SOW',
]
region=['EMEA','Asia Pacific','UK']
business=['card','secured loan','mortgage']
type=['regulatory','system','audit']
status=['pending','approved']
data=pd.DataFrame()
data['description']=np.random.choice(description, 40)
data['cr_id']=cr_id
data['region']=np.random.choice(region,40)
data['business']=np.random.choice(business, 40)
data['status']=np.random.choice(status,40)
data['type']=np.random.choice(type,40)
subset_data=data.loc[data.region == arg1]
print (subset_data.head())
subset_data=subset_data.loc[subset_data.type ==arg2]
##This has to be captured dynamically
new_cr=arg3
cr_list=data['description'].unique().tolist()
similar_CR=[] ###global variable
# for new_cr in new_cr_lis
for cr in cr_list:
result=similar(new_cr,cr)
if result >=0.8:
similar_CR.append(cr)
temp=subset_data.loc[subset_data.description.isin(similar_CR)]
temp=temp[['description','status','region']]
return temp
and then run
library(reticulate)
# To install pandas and numpy in the regular python environment
py_install("pandas", "numpy")
py_run_file("Untitled3.py")
py$get_similar_CRs("EMEA", "regulatory", "10% productivity saves SOW")
#> description status region
#> 2 10% productivity saves SOW pending EMEA
#> 25 10% productivity saves SOW pending EMEA
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.