简体   繁体   English

从HTML内容(ASP.NET)解析PlainText电子邮件

[英]Parsing PlainText Emails from HTML Content (ASP.NET)

Right, in short we basically already have a system in place where the HTML content for emails is generated. 是的,总之,我们基本上已经有了一个可以生成电子邮件HTML内容的系统。 It's not perfect, but it works. 这不是完美的,但可以。

From this, we need to be able to derive a plaintext alternative for the email. 由此,我们需要能够导出电子邮件的明文替代项。 I was thinking of instantly jumping on and creating a RegEx to strip the <*> tags from the message - but then I realised this would be no good because we do need some of the formatting information (paragraphs, line breaks, images etc). 我当时想立即跳转并创建一个RegEx来删除消息中的<*>标记-但是后来我意识到这是不好的,因为我们确实需要一些格式信息(段落,换行符,图像等)。

NOTE: I am OK with actually sending the mail and setting up alternative views etc, this is only about getting plaintext from HTML. 注意:我可以实际发送邮件并设置替代视图等,这只是关于从HTML获取纯文本。

So, I am pondering some ideas. 因此,我正在考虑一些想法。 Will post one as an answer to see what you guys think, but thought I would open it up to the floor. 将发布答案作为答案,以了解你们的想法,但以为我会公开发言。 :) :)

If you need any more clarification then please shout. 如果您需要更多说明,请大喊。

Many thanks, 非常感谢,

Rob

My Solution 我的解决方案

OK, so here it is! 好,就在这里! I thought up a solution to my problem and it works like a charm! 我想出了解决我的问题的方法,它就像一个魅力!

Now, here are some of the goals I wanted to set out: 现在,这是我要设定的一些目标:

  • All the content for the emails should remain in the ASPX pages (as the HTML content currently does). 电子邮件的所有内容都应保留在ASPX页面中(就像HTML内容当前一样)。
  • I didn't want the client code to do anything more other than say " SendMail("PageX.aspx") ". 除了说“ SendMail("PageX.aspx") ”外,我不希望客户端代码做更多的事情。
  • I didn't want to write too much code. 我不想写太多代码。
  • I wanted to keep the code as semantically correct as possible (no REALLY crazy-ass hacks!). 我想让代码在语义上尽可能地正确(没有真正的疯子!)。

The Process 流程

So, this is what I ended up doing: 所以,这就是我最终要做的事情:

  • Go to the master page for the email messages. 转到母版页上的电子邮件。 Create an ASP.NET MultiView Control . 创建一个ASP.NET MultiView控件 This control would have two views - HTML and PlainText. 该控件将具有两个视图-HTML和PlainText。
  • Within each view, I added content placeholders for the actual content. 在每个视图中,我为实际内容添加了内容占位符。
  • I then grabbed all the existing ASPX code (such as header and footer) and stuck it in the HTML View. 然后,我获取了所有现有的ASPX代码(例如页眉和页脚),并将其粘贴在HTML视图中。 All of it, DocType and everything. 所有这些,DocType和所有内容。 This does cause VS to whinge a little bit. 这确实会使VS发出一点点震颤。 Ignore It. 忽略它。
  • I then of course added new content to the PlainText view to best replicate the HTML view in a PlainText environment. 然后,我当然将新内容添加到PlainText视图中,以最好地在PlainText环境中复制HTML视图。
  • I then added some code to the Master Page_Load , checking for the QueryString parameter "type" which could be either "html" or "text". 然后,我向Master Page_Load添加了一些代码,检查QueryString参数“ type”是否可以是“ html”或“ text”。 It falls over to "text" if none present. 如果不存在,它将变为“文本”。 Dependant on the value, it switches the view. 根据该值,它切换视图。
  • I then go to the content pages and add new placeholders for the PlainText equivalents and add text as required. 然后,我转到内容页面,为PlainText等效项添加新的占位符,并根据需要添加文本。
  • To make my life easier, I then overloaded my SendMail method to get the response for the required page, passing " type=html " and " type=text " and creating AlternateView 's as appropriate. 为了使我的生活更轻松,然后我重载了SendMail方法以获取所需页面的响应,并传递“ type=html ”和“ type=text ”,并适当地创建了AlternateView

In Summary 综上所述

So, in short: 因此,简而言之:

  • The Views seperate the actual "views" of the content (HTML and Text). 视图将内容(HTML和文本)的实际“视图”分开。
  • A master page auto switches the view based on a QueryString. 母版页基于QueryString自动切换视图。
  • Content pages are responsible for how their views look. 内容页面负责其视图的外观。

Job done! 任务完成!

If any of this is unclear then please shout. 如果其中任何一个不清楚,请大喊。 I would like to create blog post on this at some point in more detail. 我想在某个时候更详细地创建博客文章。

My Idea 我的点子

Create a page based on the HTML content and traverse the control tree. 创建基于HTML内容的页面并遍历控件树。 You can then pick the text from the controls and handle different controls as required (eg use ALT text for images, "_____" for HR etc). 然后,您可以从控件中选择文本并根据需要处理其他控件(例如,对图像使用ALT文本,对HR使用“ _____”等)。

You could ensure the HTML mail is in XHTML format so you can parse it easily using the standard XML tools, then create your own DOM serialiser that outputs plain text. 您可以确保HTML邮件为XHTML格式,以便可以使用标准XML工具轻松地对其进行解析,然后创建自己的输出纯文本的DOM序列化程序。 It'd still be a lot of work to cover general XHTML, but for a limited subset you plan to use in e-mail it could work. 涵盖常规XHTML仍需要大量工作,但是对于您打算在电子邮件中使用的有限子集,它可能会起作用。

Alternatively, if you don't mind shelling out to another program, you could just use the -dump switch to the lynx web browser. 另外,如果您不介意使用其他程序,则可以使用-dump开关切换到lynx Web浏览器。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM