简体   繁体   English

是否就地修改XML文件?

[英]Modifying XML file in-place?

Suppose I have the following XML File: 假设我有以下XML文件:

<book>
 <name>sometext</name>
 <name>sometext</name>
 <name>sometext</name>
 <name>Dometext</name>
 <name>sometext</name>
</book> 

If I wanted to modify the content by changing D to s (As shown in the fourth "name" node) without having to read/write the entire file, would this be possible? 如果我想通过将D更改为s来修改内容(如第四个“名称”节点中所示)而不必读/写整个文件,这可能吗?

A 10 MB file is not a problem. 一个10 MB的文件不是问题。 Slurp it up. 啜饮它。 Modify the DOM. 修改DOM。 Write it back to the filesystem. 将其写回文件系统。 10 GB is more of a problem. 10 GB更是一个问题。 In that case: 在这种情况下:

Assumption : You are not changing the length of the file. 假设 :您没有更改文件的长度。 Think of the file as an array of characters and not a (linked) list of characters: You cannot add characters in the middle, only change them. 将文件视为字符数组而不是(链接)字符列表:您不能在中间添加字符,只能更改它们。

You need to seek the position in the file to change and then write that character to disk. 你需要seek在文件中的位置来改变,然后write该字符到磁盘。

In the .NET world, with a FileStream object, you what to set the Position attribute to the index of the D character and then write a single s character. 在.NET世界中,使用FileStream对象,您可以将Position属性设置为D字符的索引,然后编写单个s字符。 Check out this question on random access of text files . 查看有关文本文件随机访问的问题

Also read this question: How to insert characters to a file using C# . 另请阅读此问题: 如何使用C#将字符插入文件 It looks like you can't really use the FileStream object, but instead will have to resort to writing individual bytes. 看起来你不能真正使用FileStream对象,而是必须求助于编写单个字节。

Good luck. 祝好运。 But really, if we are only talking 10 MB, then just slurp it up. 但实际上,如果我们只谈论10 MB,那么就把它搞砸了。 The computer should be doing your work. 电脑应该做你的工作。

I would just read in the file, process, and spit it back out. 我只是读入文件,处理并吐出来。

This can be done in a streaming fashion with XmlReader -- it's more manual work than XmlDocument or XDocument, but it does avoid creating an in-memory DOM (XmlDocument/XDocument can be used with this same read/write pattern, but generally require the full reconstruction in-memory): 这可以使用XmlReader以流式方式完成 - 它比XmlDocument或XDocument更多的手动工作,但它确实避免创建内存中的DOM(XmlDocument / XDocument可以使用相同的读/写模式,但通常需要在内存中完全重建):

  1. Open file input file stream (XmlReader) 打开文件输入文件流(XmlReader)
  2. Open output file stream (XmlWriter, to a different file ) 打开输出文件流(XmlWriter, 到另一个文件
  3. Read from XmlReader and write to XmlWriter performing any transformations as neccessary. 从XmlReader读取并写入XmlWriter,根据需要执行任何转换。
  4. Close streams 关闭溪流
  5. Move new file to old file (overwrite, an atomic action) 将新文件移动到旧文件(覆盖,原子操作)

While this can be setup to process input and output on the same open file with a bunch of really clever work nothing will be saved and there any many edge cases including increasing on decreasing file lengths. 虽然这可以设置为在同一个打开的文件上处理输入和输出,但是一堆非常聪明的工作将不会保存任何边缘情况,包括增加减少文件长度。 In fact, it might be slower to try and simply shift the contents of a file backwards to fill in gaps or shift the file contents forward to make new room. 实际上,尝试简单地向后移动文件内容以填补空白或将文件内容向前移动以创建新空间可能会更慢 The filesystem cache will likely make any "gains" minimal/moot for anything but the most basic length-preserving operation. 除了最基本的长度保留操作之外,文件系统缓存可能会使任何“增益”最小/没有任何意义。 In addition, modifying a file in place is not an atomic action and is generally non-recoverable in case of an error: at the expense of a temporary file, the read/write/move approach is atomic wrt the final file contents. 此外,在适当的位置修改文件不是原子操作,并且在出现错误时通常是不可恢复的:以临时文件为代价,读/写/移动方法与最终文件内容相同。

Or, consider XSLT -- it was designed for this ;-) 或者,考虑XSLT - 它是为此设计的;-)

Happy coding. 快乐的编码。

最干净(也是最好)的方法是使用XmlDocument对象进行操作,但快速而肮脏的解决方案是将XML读取为字符串,然后:

xmlText = xmlText.Replace("Dometext", "sometext");

An XML file is a text file and does not allow for insertion/deletions. XML文件是文本文件,不允许插入/删除。 The only mutations supported are OverWrite and Append. 支持的唯一突变是OverWrite和Append。 Not a good match for XML. 与XML不太匹配。

So, first make very sure you really need this. 所以,首先要确保你真的需要这个。 It's a complicated operation, only worth it on very large files. 这是一个复杂的操作,只值得在非常大的文件上。

Since there could be a change in length you will at least have to move everything after the first replacement. 由于长度可能会发生变化,因此您必须第一次更换至少移动所有物品。 The possibility of multiple replacements means you may need a big buffer to accommodate the changes. 多次替换的可能性意味着您可能需要一个大缓冲区来适应这些变化。

It's easier to copy the whole file. 复制整个文件更容易。 That is expensive in I/O but you save on memory use. 这在I / O中很昂贵,但可以节省内存使用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM