简体   繁体   English

可以从 pandoc 的独立 HTML 文件中提取图像吗?

[英]Can one extract images from pandoc's self-contained HTML files?

I have used pandoc with the option --self-contained to create HTML documents where images are embedded in the HTML code as base64.我使用带有选项 --self-contained 的 pandoc 创建 HTML 文档,其中图像作为 base64 嵌入到 HTML 代码中。

The image is included in the IMG tag like this (where I have replaced the long string of base64-characters with a placeholder: <IMG src="data:image/png;base64,<<base64-coded characters here>>" width=672">图片像这样包含在 IMG 标签中(我用占位符替换了长串的 base64 字符: <IMG src="data:image/png;base64,<<base64-coded characters here>>" width=672">

Now, I'd like to extract such images, ie do the reverse where base64-coded data are replaced by references to files and the data converted to ordinary PNG or JPEG files that are saved on disk.现在,我想提取此类图像,即执行相反的操作,将 base64 编码的数据替换为对文件的引用,并将数据转换为保存在磁盘上的普通 PNG 或 JPEG 文件。

I was hoping to use pandoc to do this conversion, but I could not find an option for this in pandoc, nor have I found any other software that does it.我希望使用 pandoc 进行这种转换,但我在 pandoc 中找不到这个选项,也没有找到任何其他软件可以做到这一点。 Ideally, the solution should be shell/script-type that can easily be included in a longer toolchain.理想情况下,解决方案应该是 shell/脚本类型的,可以很容易地包含在更长的工具链中。

You can use pandoc with the --extract-media option.您可以将 pandoc 与 --extract --extract-media选项一起使用。 The images will be written to the supplied directory and the base64 URLs will be replaced with references to those files.图像将写入提供的目录,base64 URL 将替换为对这些文件的引用。

Eg例如

pandoc --from=html YOUR_FILE.html --extract-media=images

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM