[英]How do I convert a formatted email into plain text in Java?
I have a program that forwards an email as a text message to a Customer. 我有一个程序可以将电子邮件作为短信转发给客户。
Now a Simple reply to an email with text "420" written in its message body gets converted to 现在,将对在其邮件正文中写有文本“ 420”的电子邮件的简单回复转换为
*
<div dir="ltr">420</div><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, Aug 8, 2013 at 4:14 PM, <span dir="ltr">< 3:50 AM+11111111111: (2/6)<a href="mailto:xxxxxx@gmail.com" target="_blank">xxxxxx@gmail.com</a>></span> wrote:<br> <blockquote class="gmail_quot 3:50 AM +14411111111: (3/6)e" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">414<div class="HOEnZb"><div class="h5"><br>DO_NOT_REPLY:This i 3:50 AM
: (4/6)s an email notification that you have received a text message from a customer in . If you reply to this email, a text message or 3:50 AM
(5/6)email message will NOT go to the customer. Access the customer text message to send a reply. </div></div></blockquote></div> 3:50 AM
(6/6)<br></div>
*
How to I remove all formatting from Text and only forward the message body ? 如何从“文本”中删除所有格式并仅转发邮件正文?
I would suggest using JSoup . 我建议使用JSoup 。 It makes it very easy to extract the text from html. 这使得从html提取文本非常容易。 A simple example would be as follows. 一个简单的例子如下。
Document doc = Jsoup.parse("My scores are <strong>good</strong> in <date>2013</date>");
String text = doc.body().text();
System.out.println(text);
This prints 此打印
My scores are good in 2013. 我在2013年的成绩很好。
import java.io.IOException;
import java.io.StringReader;
import javax.swing.text.MutableAttributeSet;
import javax.swing.text.html.HTML.Attribute;
import javax.swing.text.html.HTML.Tag;
import javax.swing.text.html.HTMLEditorKit.Parser;
import javax.swing.text.html.HTMLEditorKit.ParserCallback;
import javax.swing.text.html.parser.ParserDelegator;
public class ExtractEmailBody
{
public static void main(String[] args) throws IOException
{
String email = "<div dir=\"ltr\">420</div><div class=\"gmail_extra\"><br><br><div class=\"gmail_quote\">On Thu, Aug 8, 2013 at 4:14 PM, <span dir=\"ltr\">< 3:50 AM+11111111111: (2/6)<a href=\"mailto:xxxxxx@gmail.com\" target=\"_blank\">xxxxxx@gmail.com</a>></span> wrote:<br> <blockquote class=\"gmail_quot 3:50 AM +14411111111: (3/6)e\" style=\"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex\">414<div class=\"HOEnZb\"><div class=\"h5\"><br>DO_NOT_REPLY:This i 3:50 AM" +
": (4/6)s an email notification that you have received a text message from a customer in Kaarma. If you reply to this email, a text message or 3:50 AM" +
"(5/6)email message will NOT go to the customer. Access the customer text message to send a reply. </div></div></blockquote></div> 3:50 AM" +
"(6/6)<br></div>";
class EmailCallback extends ParserCallback
{
private String body_;
private boolean divStarted_;
public String getBody()
{
return body_;
}
@Override
public void handleStartTag(Tag t, MutableAttributeSet a, int pos)
{
if (t.equals(Tag.DIV) && "ltr".equals(a.getAttribute(Attribute.DIR)))
{
divStarted_ = true;
}
}
@Override
public void handleEndTag(Tag t, int pos)
{
if (t.equals(Tag.DIV))
{
divStarted_ = false;
}
}
@Override
public void handleText(char[] data, int pos)
{
if (divStarted_)
{
body_ = new String(data);
}
}
}
EmailCallback callback = new EmailCallback();
Parser parser = new ParserDelegator();
StringReader reader = new StringReader(email);
parser.parse(reader, callback, true);
reader.close();
System.out.println(callback.getBody());
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.