I have to develop a utility that accepts path of a folder containing multiple log/text files of around 200 MB each and then traverse through all files to pick four elements from the lines where they exist.
I have tried multiple solutions, All solutions are working perfectly fine for smaller files but when i load bigger file the Windows Form just hangs or it shows "OutOfMemory Exception". Please help
Solution 1:
string textFile;
string re1 = "((?:2|1)\\d{3}(?:-|\\/)(?:(?:0[1-9])|(?:1[0-2]))(?:-|\\/)(?:(?:0[1-9])|(?:[1-2][0-9])|(?:3[0-1]))(?:T|\\s)(?:(?:[0-1][0-9])|(?:2[0-3])):(?:[0-5][0-9]):(?:[0-5][0-9]))";
FolderBrowserDialog fbd = new FolderBrowserDialog();
DialogResult result = fbd.ShowDialog();
if (!string.IsNullOrWhiteSpace(fbd.SelectedPath))
{
string[] files = Directory.GetFiles(fbd.SelectedPath);
System.Windows.Forms.MessageBox.Show("Files found: " + files.Length.ToString(), "Message");
foreach (string fileName in files)
{
textFile = File.ReadAllText(fileName);
MatchCollection mc = Regex.Matches(textFile, re1);
foreach (Match m in mc)
{
string a = m.ToString();
Path.Text += a; //Temporary, Just to check the output
Path.Text += Environment.NewLine;
}
}
}
Soltuion 2:
string re1 = "((?:2|1)\\d{3}(?:-|\\/)(?:(?:0[1-9])|(?:1[0-2]))(?:-|\\/)(?:(?:0[1-9])|(?:[1-2][0-9])|(?:3[0-1]))(?:T|\\s)(?:(?:[0-1][0-9])|(?:2[0-3])):(?:[0-5][0-9]):(?:[0-5][0-9]))";
FolderBrowserDialog fbd = new FolderBrowserDialog();
DialogResult result = fbd.ShowDialog();
foreach (string file in System.IO.Directory.GetFiles(fbd.SelectedPath))
{
const Int32 BufferSize = 512;
using (var fileStream = File.OpenRead(file))
using (var streamReader = new StreamReader(fileStream, Encoding.UTF8, true, BufferSize))
{
String line;
while ((line = streamReader.ReadLine()) != null)
{
MatchCollection mc = Regex.Matches(line, re1);
foreach (Match m in mc)
{
string a = m.ToString();
Path.Text += a; //Temporary, Just to check the output
Path.Text += Environment.NewLine;
}
}
}
Solution 3:
string re1 = "((?:2|1)\\d{3}(?:-|\\/)(?:(?:0[1-9])|(?:1[0-2]))(?:-|\\/)(?:(?:0[1-9])|(?:[1-2][0-9])|(?:3[0-1]))(?:T|\\s)(?:(?:[0-1][0-9])|(?:2[0-3])):(?:[0-5][0-9]):(?:[0-5][0-9]))";
FolderBrowserDialog fbd = new FolderBrowserDialog();
DialogResult result = fbd.ShowDialog();
using (StreamReader r = new StreamReader(file))
{
try
{
string line = String.Empty;
while (!r.EndOfStream)
{
line = r.ReadLine();
MatchCollection mc = Regex.Matches(line, re1);
foreach (Match m in mc)
{
string a = m.ToString();
Path.Text += a; //Temporary, Just to check the output
Path.Text += Environment.NewLine;
}
}
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}
Few things should be taken care of
Path.Text += ...
. I am assuming that is just a test code and hopefully should just get thrown out File.ReadLines
call with no practical difference in file reading speed for your case Below is a sample code to implement the above guidelines
string re1 = "((?:2|1)\\d{3}(?:-|\\/)(?:(?:0[1-9])|(?:1[0-2]))(?:-|\\/)(?:(?:0[1-9])|(?:[1-2][0-9])|(?:3[0-1]))(?:T|\\s)(?:(?:[0-1][0-9])|(?:2[0-3])):(?:[0-5][0-9]):(?:[0-5][0-9]))";
var buf = new List<string>();
var re2 = new Regex(re1, RegexOptions.Compiled);
FolderBrowserDialog fbd = new FolderBrowserDialog();
DialogResult result = fbd.ShowDialog();
foreach (string file in System.IO.Directory.GetFiles(fbd.SelectedPath)) {
foreach (var line in File.ReadLines(file)) {
if ((indx = line.IndexOf('-')) == -1 || line.IndexOf(':', indx + 1) == -1)
continue;
MatchCollection mc = re2.Matches(line);
foreach (Match m in mc) {
string a = m.ToString();
buf.Add(a + Environment.NewLine); //Temporary, Just to check the output
}
}
}
Your "Path" debug may be concatenating a ton of string litters. Change it to StringBuilder instead of += concatenation to see if that is the cause of your memory issue
Have up looked at MS Log Parser 2.2 for an alternate approach?
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.