简体   繁体   中英

R knitr: use spin() with R and Python code

With the advent of reticulate , combining R and Python in a single.Rmd document has become increasingly popular among the R community (myself included). Now, my personal workflow usually starts with an R script and, at some point, I create a shareable report using knitr::spin() with the plain.R document as input in order to avoid code duplication (see also Knitr's best hidden gem: spin for more on the topic).

However, as soon as Python code is involved in my analysis, I am currently forced to break this workflow and manually convert (ie. copy and paste) my initial.R script into.Rmd before compiling the report. I wonder, does anybody know whether it is – or for that matter, will ever be – possible to make knitr::spin() work with both R and Python code chunks in a single.R file without taking this detour? I mean, just like it works when mixing the two languages, and exchanging objects between them, in a.Rmd file. There is, at least to the best of my knowledge, no possibility to add something like engine = 'python' to spin documents at the moment.

Use of reticulate::source_python could be one solution.

For example, here is a simple.R script which will be spun to.Rmd and then rendered to.html

spin-me.R

#'---
#'title: R and Python in a spin file.
#'---
#'
#' This is an example of one way to write one R script, containing both R and
#' python, and can be spun to .Rmd via knitr::spin.
#'
#+ label = "setup"
library(nycflights13)
library(ggplot2)
library(reticulate)
use_condaenv()

#'
#' Create the file flights.csv to
#'
#+ label = "create_flights_csv"
write.csv(flights, file = "flights.csv")

#'
#' The file flights.py will read in the data from the flights.csv file.  It can
#' be evaluated in this script via source_python().  This sould add a data.frame
#' called `py_flights` to the workspace.
source_python(file = "flights.py")

#'
#' And now, plot the results.
#'
#+ label = "plot"
ggplot(py_flights) + aes(carrier, arr_delay) + geom_point() + geom_jitter()


# /* spin and knit this file to html
knitr::spin(hair = "spin-me.R", knit = FALSE)
rmarkdown::render("spin-me.Rmd")
# */

The python file is

flights.py

import pandas
py_flights = pandas.read_csv("flights.csv")
py_flights = py_flights[py_flights['dest'] == "ORD"]
py_flights = py_flights[['carrier', 'dep_delay', 'arr_delay']]
py_flights = py_flights.dropna()

And a screen capture of the resulting.html is:

在此处输入图像描述

EDIT If keeping everything in one file is a must, then before the source_python call you could create a python file, eg,

pycode <-
'import pandas
py_flights = pandas.read_csv("flights.csv")
py_flights = py_flights[py_flights["dest"] == "ORD"]
py_flights = py_flights[["carrier", "dep_delay", "arr_delay"]]
py_flights = py_flights.dropna()
'
cat(pycode, file = "temp.py")
source_python(file = "temp.py")

My opinion: having the python code in its own file would be preferable to having it created in the R script for two reasons:

  1. Easier reuse of the python code
  2. Syntax highlighting in my IDE is lost for the python code when written as a string an not in its own file.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM