简体   繁体   中英

Delete Headers and Trailer from File C# File.ReadAllLines

I'm trying to read a file and produce a new file without lines that contain "HEADER" or "TRAILER". Below is my code. When I put a breakpoint at text[i].Remove(i), it appears to execute that code but the text variable never removes the line. Any help would be greatly appreciated.

        var text = File.ReadAllLines(fileName);
        int i = 0;
        foreach (string line in text)
        {
            if (line.Substring(0, 20).Contains("HEADER") || line.Substring(0, 20).Contains("TRAILER"))
            {
                text[i].Remove(i);
            }
            else
            { 
            i++;
            }
        }
        string newFN = fileName + "b";
        File.WriteAllLines(newFN, text);

You cannot change enumerable from within an enumerator. Create new writable collection and insert only those lines matching your predicate or use LINQ for creating new enumerable that has your criteria already applied and then map it to an array or a list or whatever collection you need.

ReadAllLines returns an string array. You can use LINQ to select to new output from text

var text = File.ReadAllLines(fileName).Select(i => {line.Contains("HEADER")? "": line});

File.WriteAllLines(newFN, text);
File.WriteAllLines(filename + "b", File.ReadAllLines(filename)
    .Select(l => l.Substring(0, 20))
    .Where(s => !s.Contains("HEADER") && !s.Contains("TRAILER")));

You can read in all the lines, select the first 20 character of each line, and then use Where to exclude all lines in which the first 20 characters contain HEADER or TRAILER and then write those resulting lines to file.

I would use ReadLines instead of ReadAllLines as it allows you to enumerate while it is still reading more lines of the file. This way you do not have to read the entire file into memory before you start writing out the new file. You could then simplify your code down to this one line:

File.WriteAllLines(fileName + "b", File.ReadLines(fileName).Where(line => !line.Contains("HEADER") | !line.Contains("TRAILER")));

This will cause it to leave out the HEADER and TRAILER lines when writing the new file.

Also, to more completely answer your question String.Remove will remove all characters from the index to the end of the string and return you a new string. Strings in .Net are immutable and so it will not modify the current string, just give you back a new string. Also, when you call Substring in your comparisons, those methods are creating new string instances just for you to check if a string is contained within that stretch of characters. It would be better to just call Contains on the string.

var lines = File.ReadLines(fileName);
var filtered = lines.Where(line => !line.Contains("HEADER") && !line.Contains("TRAILER")));
File.WriteAllLines(filename + "b", filtered);     // or filename.Replace(".txt", "b.txt") ?

.Substring(0, 20) allocates memory for a new string and will fail for lines that have less than 20 characters, so in most cases just .Contains will be faster. Or, you can use .IndexOf instead:

line.IndexOf("HEADER", 0, 20, StringComparison.OrdinalIgnoreCase) < 0

I am guessing that RegEx might be a bit faster by avoiding some of the extra memory allocations:

string text = File.ReadAllText(fileName);
string[] parts = Regex.Split(text, @"\n?\r?.*(HEAD|TRAIL)ER.*\n?\r?");
File.WriteAllLines(filename + "b", parts);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM