简体   繁体   English

如何使用 JavaScript 将 HTML 转换为 RTF

[英]How to convert HTML to RTF using JavaScript

I have an input HTML file with header and footer.我有一个带有页眉和页脚的输入 HTML 文件。 It needs to be converted to RTF.它需要转换为RTF。 The header/footer of HTML should be repeated in the resultant RTF file. HTML 的页眉/页脚应在生成的 RTF 文件中重复。

Is there any plugin to convert HTML to RTF by only using JavaScript ??是否有任何插件可以仅使用JavaScriptHTML转换为 RTF

You can use this converter你可以使用这个转换器

However it does not address bullet points (ul, li elements)但是它没有解决要点(ul,li 元素)

function convertHtmlToRtf(html) {
  if (!(typeof html === "string" && html)) {
      return null;
  }

  var tmpRichText, hasHyperlinks;
  var richText = html;

  // Singleton tags
  richText = richText.replace(/<(?:hr)(?:\s+[^>]*)?\s*[\/]?>/ig, "{\\pard \\brdrb \\brdrs \\brdrw10 \\brsp20 \\par}\n{\\pard\\par}\n");
  richText = richText.replace(/<(?:br)(?:\s+[^>]*)?\s*[\/]?>/ig, "{\\pard\\par}\n");

  // Empty tags
  richText = richText.replace(/<(?:p|div|section|article)(?:\s+[^>]*)?\s*[\/]>/ig, "{\\pard\\par}\n");
  richText = richText.replace(/<(?:[^>]+)\/>/g, "");

  // Hyperlinks
  richText = richText.replace(
      /<a(?:\s+[^>]*)?(?:\s+href=(["'])(?:javascript:void\(0?\);?|#|return false;?|void\(0?\);?|)\1)(?:\s+[^>]*)?>/ig,
      "{{{\n");
  tmpRichText = richText;
  richText = richText.replace(
      /<a(?:\s+[^>]*)?(?:\s+href=(["'])(.+)\1)(?:\s+[^>]*)?>/ig,
      "{\\field{\\*\\fldinst{HYPERLINK\n \"$2\"\n}}{\\fldrslt{\\ul\\cf1\n");
  hasHyperlinks = richText !== tmpRichText;
  richText = richText.replace(/<a(?:\s+[^>]*)?>/ig, "{{{\n");
  richText = richText.replace(/<\/a(?:\s+[^>]*)?>/ig, "\n}}}");

  // Start tags
  richText = richText.replace(/<(?:b|strong)(?:\s+[^>]*)?>/ig, "{\\b\n");
  richText = richText.replace(/<(?:i|em)(?:\s+[^>]*)?>/ig, "{\\i\n");
  richText = richText.replace(/<(?:u|ins)(?:\s+[^>]*)?>/ig, "{\\ul\n");
  richText = richText.replace(/<(?:strike|del)(?:\s+[^>]*)?>/ig, "{\\strike\n");
  richText = richText.replace(/<sup(?:\s+[^>]*)?>/ig, "{\\super\n");
  richText = richText.replace(/<sub(?:\s+[^>]*)?>/ig, "{\\sub\n");
  richText = richText.replace(/<(?:p|div|section|article)(?:\s+[^>]*)?>/ig, "{\\pard\n");

  // End tags
  richText = richText.replace(/<\/(?:p|div|section|article)(?:\s+[^>]*)?>/ig, "\n\\par}\n");
  richText = richText.replace(/<\/(?:b|strong|i|em|u|ins|strike|del|sup|sub)(?:\s+[^>]*)?>/ig, "\n}");

  // Strip any other remaining HTML tags [but leave their contents]
  richText = richText.replace(/<(?:[^>]+)>/g, "");

  // Prefix and suffix the rich text with the necessary syntax
  richText =
      "{\\rtf1\\ansi\n" + (hasHyperlinks ? "{\\colortbl\n;\n\\red0\\green0\\blue255;\n}\n" : "") + richText +  "\n}";

  return richText;
}

After a bit of search I found a working solution:经过一番搜索,我找到了一个可行的解决方案:

https://www.npmjs.com/package/html-to-rtf https://www.npmjs.com/package/html-to-rtf

With html-to-rtf the conversion is easy (here's a piece of code based on browserify):使用html-to-rtf转换很容易(这里有一段基于 browserify 的代码):

var htmlToRtf = require('html-to-rtf');
var htmlText = "<div>...</div>"; //or whatever html you want to transform
var htmlAsRtf = htmlToRtf.convertHtmlToRtf(htmlText); // html transformed to rtf

This solution worked for me.这个解决方案对我有用。 Without browserify you'll have to find implied js inside downloaded modules with npm and link them to your html page.如果没有 browserify,您必须使用npm在下载的模块中找到隐含的js并将它们链接到您的 html 页面。

No such thing I'm afraid.没有这种事我害怕。 I checked into this when looking for any HTML to RTF converter.我在寻找任何HTML 到 RTF 转换器时检查了这一点。 Unfortunately they are a rare item.不幸的是,它们是稀有物品。

Your only option would be the make one based on the RTF specs.您唯一的选择是根据 RTF 规范制作一个。 https://msdn.microsoft.com/en-us/library/aa140277(v=office.10).aspx https://msdn.microsoft.com/en-us/library/aa140277(v=office.10).aspx

I applied @Samra solution and it was working good.我应用了@Samra 解决方案,效果很好。 But then I spotted a bug in the output: some text was cut off.但是后来我在输出中发现了一个错误:一些文本被截断了。 After a lot of investigation, it seemed to be about HTML comments ( <!-- xxxx --> ) weren't being handled properly.经过大量调查,似乎是关于 HTML 注释( <!-- xxxx --> )没有得到正确处理。 My solution was to add this richText transformation as the first one:我的解决方案是将此富文本转换添加为第一个:

// Delete HTML comments
richText = richText.replace(/<!--[\s\S]*?-->/ig,"");

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM