简体   繁体   English

使用XDocument循环浏览大型XML文件

[英]Loop through large XML file using XDocument

I have to copy nodes from an existing XML file to a newly created XML file. 我必须将节点从现有XML文件复制到新创建的XML文件。 I'm using an XDocument instance to access the existing XML file. 我正在使用XDocument实例来访问现有XML文件。 The problem is the XML file can be quite large (lets say 500K lines; Openstreetmap data). 问题在于XML文件可能很大(比如说50万行; Openstreetmap数据)。

What would be the best way to loop through large XML files without causing memory errors? 在大型XML文件中循环而不引起内存错误的最佳方法是什么?

I currently just use XDocument.Load(path) and loop through doc.Descendants() , but this causes the program to freeze until the loop is done. 我目前仅使用XDocument.Load(path)并遍历doc.Descendants() ,但这会导致程序冻结直到循环完成。 So I think I have to loop async, but I don't know the best way to achieve this. 因此,我认为我必须循环异步,但是我不知道实现此目标的最佳方法。

You can use XmlReader and IEnumerable<XElement> iterator to yield elements you need. 您可以使用XmlReaderIEnumerable<XElement>迭代器产生所需的元素。

This approach isn't asynchronous but it saves memory, because you don't need load whole file in the memory for handling. 这种方法不是异步的,但是可以节省内存,因为您不需要将整个文件加载到内存中进行处理。 Only elements you select to copy. 仅您选择要复制的元素。

public IEnumerable<XElement> ReadFile(string pathToTheFile)
{
    using (XmlReader reader = XmlReader.Create(pathToTheFile))
    {
        reader.MoveToContent();
        while (reader.Read())
        {
            If (reader.NodeType == XmlNodeType.Element)
            {
                if (reader.Name.Equals("yourElementName"))
                {
                    XElement element = XElement.ReadFrom(reader) as XElement;
                    yield return element ;
                }
            }
        }
    }
}

You can read files asynchronously 您可以异步读取文件

public async Task<IEnumerable<XElement>> ReadFileAsync(string pathToTheFile)
{
    var elements = new List<XElement>();
    var xmlSettings = new XmlReaderSettings { Async = true };
    using (XmlReader reader = XmlReader.Create(pathToTheFile, xmlSettings))
    {
        await reader.MoveToContentAsync();
        while (await reader.ReadAsync())
        {
            If (reader.NodeType == XmlNodeType.Element)
            {
                if (reader.Name.Equals("yourElementName"))
                {
                    XElement element = XElement.ReadFrom(reader) as XElement;
                    elements.Add(element);
                }
            }
        }
    }

    return elements;
}

Then you can loop all files asynchronously and await for the result 然后,您可以异步循环所有文件并等待结果

var fileTask1 = ReadFileAsync(filePath1);
var fileTask2 = ReadFileAsync(filePath2);
var fileTask3 = ReadFileAsync(filePath3);

await Task.WhenAll(new Task[] { fileTask1, fileTask2, fileTask3} );

// use results
var elementsFromFile1 = fileTask1.Result;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM