I have been trying to search string patterns in a large text file. I am reading line by line and checking each line which is causing a lot of time. I did try with HashSet
and ReadAllLines
. HashSet<string> strings = new HashSet<string>(File.ReadAllLines(@"D:\\Doc\\Tst.txt"));
Now when I am trying to search the string, it's not matching. As it is looking for a match of the entire row. I just want to check if the string appears in the row.
I had tried by using this:
using (System.IO.StreamReader file = new System.IO.StreamReader(@"D:\Doc\Tst.txt"))
{
while ((CurrentLine = file.ReadLine()) != null)
{
vals = chk_log(CurrentLine, date_Format, (range.Cells[i][counter]).Value2, vals);
if (vals == true)
break;
}
}
bool chk_log(string LineText, string date_to_chk, string publisher, bool tvals)
{
if (LineText.Contains(date_to_chk))
if (LineText.Contains(publisher))
{
tvals = true;
}
else
tvals = false;
else tvals = false;
return tvals;
}
But this is consuming too much time. Any help on this would be good.
Reading into a HashSet
doesn't make sense to me (unless there are a lot of duplicated lines) since you aren't testing for membership of the set.
Taking a really naive approach you could just do this.
var isItThere = File.ReadAllLines(@"d:\docs\st.txt").Any(x =>
x.Contains(date_to_chk) && x.Contains(publisher));
65K lines at (say) 1K a line isn't a lot of memory to worry about, and I personally wouldn't bother with Parallel
since it sounds like it would be superfast to do anyway.
You could replace Any
where First
to find the first result or Where
to get an IEnumerable<string>
containing all results.
You can use a compiled regular expression instead of String.Contains
(compile once before looping over the lines). This typically gives better performance.
var regex = new Regex($"{date}|{publisher}", RegexOptions.Compiled);
foreach (string line in File.ReadLines(@"D:\Doc\Tst.txt"))
{
if (regex.IsMatch(line)) break;
}
This also shows a convenient standard library function for reading a file line by line.
Or, depending on what you want to do...
var isItThere = File.ReadLines(@"D:\Doc\Tst.txt").Any(regex.IsMatch);
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.