[英]Parsing PlainText Emails from HTML Content (ASP.NET)
Right, in short we basically already have a system in place where the HTML content for emails is generated. 是的,总之,我们基本上已经有了一个可以生成电子邮件HTML内容的系统。 It's not perfect, but it works.
这不是完美的,但可以。
From this, we need to be able to derive a plaintext alternative for the email. 由此,我们需要能够导出电子邮件的明文替代项。 I was thinking of instantly jumping on and creating a RegEx to strip the
<*>
tags from the message - but then I realised this would be no good because we do need some of the formatting information (paragraphs, line breaks, images etc). 我当时想立即跳转并创建一个RegEx来删除消息中的
<*>
标记-但是后来我意识到这是不好的,因为我们确实需要一些格式信息(段落,换行符,图像等)。
NOTE: I am OK with actually sending the mail and setting up alternative views etc, this is only about getting plaintext from HTML. 注意:我可以实际发送邮件并设置替代视图等,这只是关于从HTML获取纯文本。
So, I am pondering some ideas. 因此,我正在考虑一些想法。 Will post one as an answer to see what you guys think, but thought I would open it up to the floor.
将发布答案作为答案,以了解你们的想法,但以为我会公开发言。 :)
:)
If you need any more clarification then please shout. 如果您需要更多说明,请大喊。
Many thanks, 非常感谢,
Rob 抢
OK, so here it is! 好,就在这里! I thought up a solution to my problem and it works like a charm!
我想出了解决我的问题的方法,它就像一个魅力!
Now, here are some of the goals I wanted to set out: 现在,这是我要设定的一些目标:
SendMail("PageX.aspx")
". SendMail("PageX.aspx")
”外,我不希望客户端代码做更多的事情。 So, this is what I ended up doing: 所以,这就是我最终要做的事情:
Page_Load
, checking for the QueryString parameter "type" which could be either "html" or "text". Page_Load
添加了一些代码,检查QueryString参数“ type”是否可以是“ html”或“ text”。 It falls over to "text" if none present. SendMail
method to get the response for the required page, passing " type=html
" and " type=text
" and creating AlternateView 's as appropriate. SendMail
方法以获取所需页面的响应,并传递“ type=html
”和“ type=text
”,并适当地创建了AlternateView 。 So, in short: 因此,简而言之:
Job done! 任务完成!
If any of this is unclear then please shout. 如果其中任何一个不清楚,请大喊。 I would like to create blog post on this at some point in more detail.
我想在某个时候更详细地创建博客文章。
Create a page based on the HTML content and traverse the control tree. 创建基于HTML内容的页面并遍历控件树。 You can then pick the text from the controls and handle different controls as required (eg use ALT text for images, "_____" for HR etc).
然后,您可以从控件中选择文本并根据需要处理其他控件(例如,对图像使用ALT文本,对HR使用“ _____”等)。
You could ensure the HTML mail is in XHTML format so you can parse it easily using the standard XML tools, then create your own DOM serialiser that outputs plain text. 您可以确保HTML邮件为XHTML格式,以便可以使用标准XML工具轻松地对其进行解析,然后创建自己的输出纯文本的DOM序列化程序。 It'd still be a lot of work to cover general XHTML, but for a limited subset you plan to use in e-mail it could work.
涵盖常规XHTML仍需要大量工作,但是对于您打算在电子邮件中使用的有限子集,它可能会起作用。
Alternatively, if you don't mind shelling out to another program, you could just use the -dump switch to the lynx web browser. 另外,如果您不介意使用其他程序,则可以使用-dump开关切换到lynx Web浏览器。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.