[英]Read x number of lines of a file at a time C#
我想一次读取和处理10条以上的GB文件,但是还没有找到可以吐出10行直到最后的解决方案。
我最后的尝试是:
int n = 10;
foreach (var line in File.ReadLines("path")
.AsParallel().WithDegreeOfParallelism(n))
{
System.Console.WriteLine(line);
Thread.Sleep(1000);
}
我已经看到了使用缓冲区大小的解决方案,但是我想读整行。
默认行为是一次读取所有行,如果您希望阅读的内容少于该数量,则需要更深入地了解它如何读取它们,并获得StreamReader ,然后可以让您控制读取过程
using (StreamReader sr = new StreamReader(path))
{
while (sr.Peek() >= 0)
{
Console.WriteLine(sr.ReadLine());
}
}
它还具有ReadLineAsync
方法,该方法将返回任务
如果将这些任务包含在ConcurrentBag中 ,则可以非常轻松地使处理一次运行10行。
var bag =new ConCurrentBag<Task>();
using (StreamReader sr = new StreamReader(path))
{
while(sr.Peek() >=0)
{
if(bag.Count < 10)
{
Task processing = sr.ReadLineAsync().ContinueWith( (read) => {
string s = read.Result;//EDIT Removed await to reflect Scots comment
//process line
});
bag.Add(processing);
}
else
{
Task.WaitAny(bag.ToArray())
//remove competed tasks from bag
}
}
}
请注意,此代码仅供参考,不能按原样使用;
如果您只需要最后十行,则可以使用此处的解决方案来解决。 如何在C#中使用迭代器反向读取文本文件
此方法将从文件中创建行的“页面”。
public static IEnumerable<string[]> ReadFileAsLinesSets(string fileName, int setLen = 10)
{
using (var reader = new StreamReader(fileName))
while (!reader.EndOfStream)
{
var set = new List<string>();
for (var i = 0; i < setLen && !reader.EndOfStream; i++)
{
set.Add(reader.ReadLine());
}
yield return set.ToArray();
}
}
...更多有趣的版本...
class Example
{
static void Main(string[] args)
{
"YourFile.txt".ReadAsLines()
.AsPaged(10)
.Select(a=>a.ToArray()) //required or else you will get random data since "WrappedEnumerator" is not thread safe
.AsParallel()
.WithDegreeOfParallelism(10)
.ForAll(a =>
{
//Do your work here.
Console.WriteLine(a.Aggregate(new StringBuilder(),
(sb, v) => sb.AppendFormat("{0:000000} ", v),
sb => sb.ToString()));
});
}
}
public static class ToolsEx
{
public static IEnumerable<IEnumerable<T>> AsPaged<T>(this IEnumerable<T> items,
int pageLength = 10)
{
using (var enumerator = new WrappedEnumerator<T>(items.GetEnumerator()))
while (!enumerator.IsDone)
yield return enumerator.GetNextPage(pageLength);
}
public static IEnumerable<T> GetNextPage<T>(this IEnumerator<T> enumerator,
int pageLength = 10)
{
for (var i = 0; i < pageLength && enumerator.MoveNext(); i++)
yield return enumerator.Current;
}
public static IEnumerable<string> ReadAsLines(this string fileName)
{
using (var reader = new StreamReader(fileName))
while (!reader.EndOfStream)
yield return reader.ReadLine();
}
}
internal class WrappedEnumerator<T> : IEnumerator<T>
{
public WrappedEnumerator(IEnumerator<T> enumerator)
{
this.InnerEnumerator = enumerator;
this.IsDone = false;
}
public IEnumerator<T> InnerEnumerator { get; private set; }
public bool IsDone { get; private set; }
public T Current { get { return this.InnerEnumerator.Current; } }
object System.Collections.IEnumerator.Current { get { return this.Current; } }
public void Dispose()
{
this.InnerEnumerator.Dispose();
this.IsDone = true;
}
public bool MoveNext()
{
var next = this.InnerEnumerator.MoveNext();
this.IsDone = !next;
return next;
}
public void Reset()
{
this.IsDone = false;
this.InnerEnumerator.Reset();
}
}
注意,ReadLines会读取您GB文件的所有行,而不仅是您要打印的行。 您真的需要并行性吗?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.