简体   繁体   中英

Replacing a string within a stream in C# (without overwriting the original file)

I have a file that I'm opening into a stream and passing to another method. However, I'd like to replace a string in the file before passing the stream to the other method. So:

string path = "C:/...";
Stream s = new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.Read);
//need to replace all occurrences of "John" in the file to "Jack" here.
CallMethod(s);

The original file should not be modified, only the stream. What would be the easiest way to do this?

Thanks...

It's a lot easier if you just read in the file as lines, and then deal with those, instead of forcing yourself to stick with a Stream , simply because stream deals with both text and binary files, and needs to be able to read in one character at a time (which makes such replacement very hard). If you read in a whole line at a time (so long as you don't have multi-line replacement) it's quite easy.

var lines = File.ReadLines(path)
    .Select(line => line.Replace("John", "Jack"));

Note that ReadLines still does stream the data, and Select doesn't need to materialize the whole thing, so you're still not reading the whole file into memory at one time when doing this.

If you don't actually need to stream the data you can easily just load it all as one big string, do the replace, and then create a stream based on that one string:

string data = File.ReadAllText(path)
    .Replace("John", "Jack");
byte[] bytes = Encoding.ASCII.GetBytes(data);
Stream s = new MemoryStream(bytes);

This question probably has many good answers. I'll try one I've used and has always worked for me and my peers.

I suggest you create a separate stream, say a MemoryStream . Read from your filestream and write into the memory one. You can then extract strings from either and replace stuff, and you would pass the memory stream ahead. That makes it double sure that you are not messing up with the original stream, and you can ever read the original values from it whenever you need, though you are using basically twice as much memory by using this method.

If the file has extremely long lines, the replaced string may contain a newline or there are other constraints preventing the use of File.ReadLines() while requiring streaming, there is an alternative solution using streams only, even though it is not trivial .

Implement your own stream decorator (wrapper) that performs the replacement. Ie a class based on Stream that takes another stream in its constructor, reads data from the stream in its Read(byte[], int, int) override and performs the replacement in the buffer. See notes to Stream implementers for further requirements and suggestions.

Let's call the string being replaced "needle", the source stream "haystack" and the replacement string "replacement".

Needle and replacement need to be encoded using the encoding of the haystack contents (typically Encoding.UTF8.GetBytes() ). Inside streams, the data is not converted to string, unlike in StreamReader.ReadLine() . Thus unnecessary memory allocation is prevented.

Simple cases : If both needle and replacement are just a single byte, the implementation is just a simple loop over the buffer, replacing all occurrences. If needle is a single byte and replacement is empty (ie deleting the byte, eg deleting carriage return for line ending normalization), it is a simple loop maintaining from and to indexes to the buffer, rewriting the buffer byte by byte.

In more complex cases, implement the KMP algorithm to perform the replacement.

  • Read the data from the underlying stream (haystack) to an internal buffer that is at least as long as needle and perform the replacement while rewriting the data to the output buffer. The internal buffer is needed so that data from a partial match are not published before a complete match is detected -- then, it would be too late to go back and delete the match completely.

  • Process the internal buffer byte by byte, feeding each byte into the KMP automaton. With each automaton update, write the bytes it releases to the appropriate position in output buffer.

  • When a match is detected by KMP, replace it: reset the automaton keeping the position in the internal buffer (which deletes the match) and write the replacement in the output buffer.

  • When end of either buffer is reached, keep the unwritten output and unprocessed part of the internal buffer including current partial match as a starting point for next call to the method and return the current output buffer. Next call to the method writes the remaining output and starts processing the rest of haystack where the current one stopped.

  • When end of haystack is reached, release the current partial match and write it to the output buffer.

Just be careful not to return an empty output buffer before processing all the data of haystack -- that would signal end of stream to the caller and therefore truncate the data.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM