简体   繁体   中英

Remove text line in XML file C# .NET

i need to write app to remove specific text line in very large XML file (about 3,5 GB).

I wrote this code:

    string directoryPath;

    OpenFileDialog ofd = new OpenFileDialog();

    private void button1_Click(object sender, EventArgs e)
    {
        ofd.Filter = "XML|*.xml";
        if (ofd.ShowDialog() == DialogResult.OK)
        {
            directoryPath = Path.GetDirectoryName(ofd.FileName);
            textBox2.Text = directoryPath;
            textBox1.Text = ofd.SafeFileName;
        }
    }

    private void Replace()
    {
        StreamReader readerFile = new StreamReader(ofd.FileName, System.Text.Encoding.UTF8);

        while (!readerFile.EndOfStream)
        {
            string stringReplaced;
            string replaceResult = textBox2.Text + "\\" + "replace_results";
            Directory.CreateDirectory(replaceResult);
            StreamWriter writerFile = new StreamWriter(replaceResult + "\\" + textBox1.Text, true);
            StringBuilder sb = new StringBuilder();
            char[] buff = new char[10 * 1024 * 1024];
            int xx = readerFile.ReadBlock(buff, 0, buff.Length);
            sb.Append(buff);
            stringReplaced = sb.ToString();
            stringReplaced = stringReplaced.Replace("line to remove", string.Empty);
            writerFile.WriteLine(stringReplaced);
            writerFile.Close();
            writerFile.Dispose();
            stringReplaced = null;
            sb = null;
        }


        readerFile.Close();
        readerFile.Dispose();
    }

    private void button2_Click(object sender, EventArgs e)
    {
        if (!backgroundWorker1.IsBusy)
        {
            backgroundWorker1.RunWorkerAsync();
            toolStripStatusLabel1.Text = "Replacing in progress...";
        }
    }

    private void backgroundWorker1_DoWork(object sender, DoWorkEventArgs e)
    {
        try
        {
            Replace();
            toolStripStatusLabel1.Text = "Replacing complete!";
        }
        catch
        {
            toolStripStatusLabel1.Text = "Error! Replacing aborted!";
        }
    }
}

it works, but not as well because new file (after remove lines) is bigger than original file and at the end of new file are added some junk (lots of dots), screenshot:

https://images81.fotosik.pl/615/873833aa0e23b36f.jpg

How i can fix my code to make new file the same as old file, only without specific lines?

For a start why keep opening and closing the output file? Keep it open.

Secondly reading blocks – which could lead to "line to remove" being split across blocks – and writing lines is an odd mix.

But I expect your issue is three fold:

  1. You do not set the encoding of the output file.

  2. When you read the buffer (10MB) you may get fewer characters read – the return from ReadBlock . But you always write the complete block. Limit the write to match the amount read (as updated but the replace).

  3. ReadBlock will include end of lines, but WriteLine will add them: either work on blocks or on lines. Mixing will only create problems (and avoid the second issue above).

This leads to code something like:

using (var rdr = OpenReadFile(...))
using (var wtr = OpenWriteFile(...)) {
  string line;
  while ((line = rdr.ReadLine()) != null) {
    line = line.Replace(x, y);
     str.WriteLine(line);
  }
}

NB Processing XML as text could lead to corrupting the XML (there is no such thing as "invalid XML": either the document is valid XML or it isn't XML, just something that looks a bit like it might be XML). Therefore any such approach needs to be handled with caution. The "proper" answer is to process as XML with the streaming APIs ( XmlReader and XmlWriter ) to avoid parsing the whole document as one.

I trying do this by XmlTextReader but i have system.xml.xmlexception during read my file, screenshot: https://images82.fotosik.pl/622/d98b35587b0befa4.jpg

Code:

XmlTextReader xmlReader = new XmlTextReader(ofd.FileName);
XmlDocument doc = new XmlDocument();
doc.Load(xmlReader);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM