I have a set of emails in a text file. I want to extract the body out of it. The sample document is shown below.
Email: 1
===============
MIME-Version: 1.0
Received: by 10.68.8.6 with HTTP; Sat, 7 Apr 2012 01:04:45 -0700 (PDT)
Date: Sat, 7 Apr 2012 13:34:45 +0530
Delivered-To: twistyprincess22@gmail.com
Message-ID: <CAGibXq7_Gjqmp=jOCu2X8+Xngb5QuoqqMQ_ZKbu9jHCoJnFYgA@mail.gmail.com>
Subject: hello
From: twisty princess <twistyprincess22@gmail.com>
To: twisty princess <twistyprincess22@gmail.com>
Content-Type: multipart/alternative; boundary=047d7b33d826e6762004bd1239b5
--047d7b33d826e6762004bd1239b5
Content-Type: text/plain; charset=ISO-8859-1
hey How are you doing?
--047d7b33d826e6762004bd1239b5
Content-Type: text/html; charset=ISO-8859-1
<br><br>hey How are you doing?<br>
--047d7b33d826e6762004bd1239b5--
So from this text, I just want "hey How are you doing?". I want this done using Regular Expressions and C#. Thanks
Use regex boundary=([^\s]+)
to find boundary name
var bname = _boundaryRegex.Match(text).Groups[1].Value;
Then format text capturing regex using bname
var textCapturer = new Regex(string.Format("--{0}(?<text>.*?)(?=--)",bname);
foreach(var match in textCapturer.Matches(text))
{
Console.WriteLine(match.Groups["text"]);
}
It finds value of boundary
parameter and then tries to match text beetween --BOUNDARY lines.
Though I don't recomend you to do this kind of parsing using regex.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.