Access the output of python script in R executed through system command

Question

My current question is follow up question to the link below.

I have executed the python code in R with system command. Now at the end of python script I want to access the Dataframe created in R. One way is to save the Dataframe created in python with df.to_csv and then import it in R. But I am wondering any efficient way to directly access the output in R.

x=system("/Users/ravinderbhatia/anaconda/bin/python /Users/ravinderbhatia/Downloads/Untitled3.py EMEA regulatory '10% productivity saves SOW'")

output dataframe is:

description                       status region
10  10% productivity saves SOW   pending   EMEA
16  10% productivity saves SOW  approved   EMEA

X just contains 0/1(status). As mentioned above, how to access Dataframe directly in R without saving it.

Python script used is:


import pandas as pd 
import numpy as np
import sys


from difflib import SequenceMatcher

def similar(a, b):
    return SequenceMatcher(None, a, b).ratio()

arg1 = sys.argv[1]
arg2 = sys.argv[2]
arg3 = sys.argv[3]
print (arg1)
print (arg2)
print (arg3)

def get_similar_CRs(arg1, arg2,arg3):
##create dummy data
    cr_id=range(1,41)

    description=['change in design','More robust system required',
                 'Grant system adminstrator rights',
                 'grant access to all products',
                 'Increase the credit limit',
                 'EDAP Scenario',
                 'Volume prpductivity for NA 2015',
                 '5% productivity saves SOW',
                 'effort reduction',
                 'reduction of false claims',
                 'Volume productivity EMEA',
                 'Volume productivity for NA 2016',
                 '10% productivity saves SOW',
                ]
    region=['EMEA','Asia Pacific','UK']

    business=['card','secured loan','mortgage']
    type=['regulatory','system','audit']

    status=['pending','approved']

    data=pd.DataFrame()
    data['description']=np.random.choice(description, 40)
    data['cr_id']=cr_id
    data['region']=np.random.choice(region,40)
    data['business']=np.random.choice(business, 40)
    data['status']=np.random.choice(status,40)
    data['type']=np.random.choice(type,40)

    subset_data=data.loc[data.region == arg1]
    print (subset_data.head())
    subset_data=subset_data.loc[subset_data.type ==arg2]

    ##This has to be captured dynamically
    new_cr=arg3

    cr_list=data['description'].unique().tolist()

    similar_CR=[] ###global variable
#     for new_cr in new_cr_lis
    for cr in cr_list:
        result=similar(new_cr,cr)
        if result >=0.8:
            similar_CR.append(cr)

    temp=subset_data.loc[subset_data.description.isin(similar_CR)]
    temp=temp[['description','status','region']]
    return temp

temp= get_similar_CRs (arg1, arg2, arg3)

print temp

Answer 1

I suggest looking into the reticulate package (see the online vignette ).

You can run your file with py_run_file() and access the python main module with py . So lets say your file is called "Untitled3.py" and the data frame it creates is called df , then

library(reticulate)

use_python("/Users/ravinderbhatia/anaconda/bin/python")

py_run_file("Untitled3.py")

py$df

Edit

Alternatively, you can import only the function from the python file and just call them from inside REg, have the python file as

import pandas as pd 
import numpy as np
import sys
from difflib import SequenceMatcher

def similar(a, b):
    return SequenceMatcher(None, a, b).ratio()

def get_similar_CRs(arg1, arg2,arg3):
    ##create dummy data
    cr_id=range(1,41)

    description=['change in design','More robust system required',
                 'Grant system adminstrator rights',
                 'grant access to all products',
                 'Increase the credit limit',
                 'EDAP Scenario',
                 'Volume prpductivity for NA 2015',
                 '5% productivity saves SOW',
                 'effort reduction',
                 'reduction of false claims',
                 'Volume productivity EMEA',
                 'Volume productivity for NA 2016',
                 '10% productivity saves SOW',
                ]
    region=['EMEA','Asia Pacific','UK']

    business=['card','secured loan','mortgage']
    type=['regulatory','system','audit']

    status=['pending','approved']

    data=pd.DataFrame()
    data['description']=np.random.choice(description, 40)
    data['cr_id']=cr_id
    data['region']=np.random.choice(region,40)
    data['business']=np.random.choice(business, 40)
    data['status']=np.random.choice(status,40)
    data['type']=np.random.choice(type,40)

    subset_data=data.loc[data.region == arg1]
    print (subset_data.head())
    subset_data=subset_data.loc[subset_data.type ==arg2]

    ##This has to be captured dynamically
    new_cr=arg3

    cr_list=data['description'].unique().tolist()

    similar_CR=[] ###global variable
#     for new_cr in new_cr_lis
    for cr in cr_list:
        result=similar(new_cr,cr)
        if result >=0.8:
            similar_CR.append(cr)

    temp=subset_data.loc[subset_data.description.isin(similar_CR)]
    temp=temp[['description','status','region']]
    return temp

and then run

library(reticulate)

# To install pandas and numpy in the regular python environment
py_install("pandas", "numpy")

py_run_file("Untitled3.py")

py$get_similar_CRs("EMEA", "regulatory", "10% productivity saves SOW")

#>                   description  status region
#> 2  10% productivity saves SOW pending   EMEA
#> 25 10% productivity saves SOW pending   EMEA

Access the output of python script in R executed through system command

Question

1 answers

solution1
0 2018-05-13 19:13:39

Access the output of python script in R executed through system command

Question

1 answers

solution1 0 2018-05-13 19:13:39

solution1
0 2018-05-13 19:13:39