简体   繁体   中英

How do I keep title & subtitle when using pandoc to convert .docx to .md in R?

I'm downloading a Google Doc as .docx and then converting to markdown for manipulation and export to multiple formats.

Problem: When I convert using pandoc, it strips title (and subtitle) and does not add any YAML header information. I could add title manually in the header, but I need it to be scripted, so need to not lose the title (ideally) or extract title from docx and add to YAML header, which would then be concatenated to the converted markdown file.

Example Code, where title is lost on conversion from docx to markdown:

require(rmarkdown);require(devtools)
examplefile=paste0(tempdir(),"/example.docx")
download.file("https://file-examples.com/wp-content/uploads/2017/02/file-sample_100kB.docx",destfile=examplefile)
pandoc_convert(examplefile,to="markdown",output = "example.rmd", options=c("--extract-media=."))

render(paste0(tempdir(), "/example.rmd"),"html_document")
browseURL(paste0(tempdir(),"/example.html"))

When converting from docx to markdown (or another markup format like rst) you need to include the -s or --standalone option.

From the pandoc documentation :

-s, --standalone

Produce output with an appropriate header and footer (eg a standalone HTML, LaTeX, TEI, or RTF file, not a fragment). This option is set automatically for pdf, epub, epub3, fb2, docx, and odt output. For native output, this option causes metadata to be included; otherwise, metadata is suppressed.

Without the -s this data is suppressed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM