简体   繁体   English

删除XML文件C#.NET中的文本行

[英]Remove text line in XML file C# .NET

i need to write app to remove specific text line in very large XML file (about 3,5 GB). 我需要编写应用程序来删除非常大的XML文件(约3.5 GB)中的特定文本行。

I wrote this code: 我写了这段代码:

    string directoryPath;

    OpenFileDialog ofd = new OpenFileDialog();

    private void button1_Click(object sender, EventArgs e)
    {
        ofd.Filter = "XML|*.xml";
        if (ofd.ShowDialog() == DialogResult.OK)
        {
            directoryPath = Path.GetDirectoryName(ofd.FileName);
            textBox2.Text = directoryPath;
            textBox1.Text = ofd.SafeFileName;
        }
    }

    private void Replace()
    {
        StreamReader readerFile = new StreamReader(ofd.FileName, System.Text.Encoding.UTF8);

        while (!readerFile.EndOfStream)
        {
            string stringReplaced;
            string replaceResult = textBox2.Text + "\\" + "replace_results";
            Directory.CreateDirectory(replaceResult);
            StreamWriter writerFile = new StreamWriter(replaceResult + "\\" + textBox1.Text, true);
            StringBuilder sb = new StringBuilder();
            char[] buff = new char[10 * 1024 * 1024];
            int xx = readerFile.ReadBlock(buff, 0, buff.Length);
            sb.Append(buff);
            stringReplaced = sb.ToString();
            stringReplaced = stringReplaced.Replace("line to remove", string.Empty);
            writerFile.WriteLine(stringReplaced);
            writerFile.Close();
            writerFile.Dispose();
            stringReplaced = null;
            sb = null;
        }


        readerFile.Close();
        readerFile.Dispose();
    }

    private void button2_Click(object sender, EventArgs e)
    {
        if (!backgroundWorker1.IsBusy)
        {
            backgroundWorker1.RunWorkerAsync();
            toolStripStatusLabel1.Text = "Replacing in progress...";
        }
    }

    private void backgroundWorker1_DoWork(object sender, DoWorkEventArgs e)
    {
        try
        {
            Replace();
            toolStripStatusLabel1.Text = "Replacing complete!";
        }
        catch
        {
            toolStripStatusLabel1.Text = "Error! Replacing aborted!";
        }
    }
}

it works, but not as well because new file (after remove lines) is bigger than original file and at the end of new file are added some junk (lots of dots), screenshot: 它可以工作,但效果不佳,因为新文件(删除行之后)比原始文件大,并且在新文件的末尾添加了一些垃圾(许多点),截图:

https://images81.fotosik.pl/615/873833aa0e23b36f.jpg https://images81.fotosik.pl/615/873833aa0e23b36f.jpg

How i can fix my code to make new file the same as old file, only without specific lines? 我如何解决我的代码以使新文件与旧文件相同,仅没有特定行?

For a start why keep opening and closing the output file? 首先,为什么要继续打开和关闭输出文件? Keep it open. 保持打开状态。

Secondly reading blocks – which could lead to "line to remove" being split across blocks – and writing lines is an odd mix. 其次,读取块(可能导致“删除行”被拆分成块)和写入行是一种奇怪的混合。

But I expect your issue is three fold: 但我希望您的问题有三方面:

  1. You do not set the encoding of the output file. 您没有设置输出文件的编码。

  2. When you read the buffer (10MB) you may get fewer characters read – the return from ReadBlock . 当您读取缓冲区(10MB)时,可能会读取较少的字符–从ReadBlock返回。 But you always write the complete block. 但是您总是写完整的块。 Limit the write to match the amount read (as updated but the replace). 限制写入以匹配读取的数量(已更新但已替换)。

  3. ReadBlock will include end of lines, but WriteLine will add them: either work on blocks or on lines. ReadBlock将包括行尾,但是WriteLine将添加它们:在块或行上工作。 Mixing will only create problems (and avoid the second issue above). 混合只会产生问题(并避免上述第二个问题)。

This leads to code something like: 这导致代码类似:

using (var rdr = OpenReadFile(...))
using (var wtr = OpenWriteFile(...)) {
  string line;
  while ((line = rdr.ReadLine()) != null) {
    line = line.Replace(x, y);
     str.WriteLine(line);
  }
}

NB Processing XML as text could lead to corrupting the XML (there is no such thing as "invalid XML": either the document is valid XML or it isn't XML, just something that looks a bit like it might be XML). 注意:将 XML作为文本处理可能会导致XML损坏(不存在“无效XML”之类的东西:文档是有效XML或不是XML,只是看起来有点像XML)。 Therefore any such approach needs to be handled with caution. 因此,任何此类方法都必须谨慎处理。 The "proper" answer is to process as XML with the streaming APIs ( XmlReader and XmlWriter ) to avoid parsing the whole document as one. 正确的答案是使用流API( XmlReaderXmlWriter )作为XML处理以避免将整个文档解析为一个。

I trying do this by XmlTextReader but i have system.xml.xmlexception during read my file, screenshot: https://images82.fotosik.pl/622/d98b35587b0befa4.jpg 我尝试通过XmlTextReader进行此操作,但在读取文件时遇到了system.xml.xmlexception,截图: https : //images82.fotosik.pl/622/d98b35587b0befa4.jpg

Code: 码:

XmlTextReader xmlReader = new XmlTextReader(ofd.FileName);
XmlDocument doc = new XmlDocument();
doc.Load(xmlReader);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM