简体   繁体   English

Emacs 中 HTML 到文本的转换

[英]HTML to Text Conversion in Emacs

I have a bunch of org-mode files with snippets containing HTML code and I would like to convert those to plain text.我有一堆包含 HTML 代码片段的组织模式文件,我想将它们转换为纯文本。

I don't need any fancy fully automated solution, I can just past my HTML snippet into a scratch buffer if that's easier.我不需要任何花哨的全自动解决方案,如果更容易的话,我可以将我的 HTML 片段传递到临时缓冲区中。

Here's a simple example of desired behavior:这是所需行为的简单示例:

<div><div>First Line<br>Second Line</div></div> 
First Line
Second Line

What are the options available to Emacs users for such a task?对于此类任务,Emacs 用户可以使用哪些选项?

Emacs added EWW in Emacs 24.4 (2014), the Emacs Web Wowser, a built-in web browser . Emacs 在 Emacs 24.4 (2014) 中添加了 EWW,即 Emacs Web Wowser,一个内置的网络浏览器。 The shr.el library is used for rendering HTML, eg, shr.el 库用于渲染 HTML,例如,

(with-temp-buffer
  (insert
   "<div><div>First Line<br>Second Line</div></div> ")
  (shr-render-region (point-min) (point-max))
  (buffer-substring-no-properties (point-min) (point-max)))

;; =>

"First Line
Second Line
"

shr-render-region uses libxml-parse-html-region which requires your Emacs has libxml2 support. shr-render-region使用libxml-parse-html-region ,它需要你的 Emacs 有 libxml2 支持。

html2org package seems to get the job done html2org 包似乎完成了工作

html2org function converts and replaces the HTML code as text. html2org函数将 HTML 代码转换并替换为文本。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM