[英]What's the best way to get all the content in between two tagged lines of a file so that you can deserialize it?
I've been noticing that the following segment of code does not scale well for large files (I think that appending to the paneContent string is slow): 我一直注意到以下代码段不能很好地用于大型文件(我认为追加到paneContent字符串很慢):
string paneContent = String.Empty;
bool lineFound = false;
foreach (string line in File.ReadAllLines(path))
{
if (line.Contains(tag))
{
lineFound = !lineFound;
}
else
{
if (lineFound)
{
paneContent += line;
}
}
}
using (TextReader reader = new StringReader(paneContent))
{
data = (PaneData)(serializer.Deserialize(reader));
}
What's the best way to speed this all up? 加快这一切的最好方法是什么? I have a file that looks like this (so I wanna get all the content in between the two different tags and then deserialize all that content):
我有一个看起来像这样的文件(所以我想在两个不同的标签之间获取所有内容,然后反序列化所有内容):
A line with some tag
A line with content I want to get into a single stream or string
A line with content I want to get into a single stream or string
A line with content I want to get into a single stream or string
A line with content I want to get into a single stream or string
A line with content I want to get into a single stream or string
A line with some tag
Note: These tags are not XML tags. 注意:这些标记不是XML标记。
You could use a StringBuilder as opposed to a string, that is what the StringBuilder is for. 您可以使用StringBuilder而不是字符串,这就是StringBuilder的目的。 Some example code is below:
下面是一些示例代码:
var paneContent = new StringBuilder();
bool lineFound = false;
foreach (string line in File.ReadLines(path))
{
if (line.Contains(tag))
{
lineFound = !lineFound;
}
else
{
if (lineFound)
{
paneContent.Append(line);
}
}
}
using (TextReader reader = new StringReader(paneContent.ToString()))
{
data = (PaneData)(serializer.Deserialize(reader));
}
As mentioned in this answer , a StringBuilder is preferred to a string when you are concatenating in a loop, which is the case here. 就像在这个答案中提到的那样,当您在循环中串联时,StringBuilder优先于字符串,在这种情况下就是这样。
Here is an example of how to use groups with regexes and retrieve their contents afterwards. 这是一个如何在正则表达式中使用组并随后检索其内容的示例。
What you want is a regex that will match your tags, label this as a group then retrieve the data of the group as in the example 您想要的是一个与标签匹配的正则表达式,将其标记为组,然后如示例中那样检索组的数据
Use a StringBuilder
to build your data string ( paneContent
). 使用
StringBuilder
生成数据字符串( paneContent
)。 It's much faster because concatenating strings results in new memory allocations. 因为连接字符串会导致新的内存分配,所以速度要快得多。
StringBuilder
pre-allocates memory (if you expect large data strings, you can customize the initial allocation). StringBuilder
预先分配内存(如果您期望大型数据字符串,则可以自定义初始分配)。
It's a good idea to read your input file line-by-line so you can avoid loading the whole file into memory if you expect files with many lines of text. 逐行读取输入文件是一个好主意,因此,如果您希望文件包含多行文本,则可以避免将整个文件加载到内存中。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.