简体   繁体   中英

Java - Dynamic String replacement inside a Reader stream

I have a (text) file on disk, which I need to read into a library that takes a Reader object.

While reading this file, I want to perform a regex String replacement on the data.

My current solution is to read the whole file into memory as one String, do the String replacement, and then create a StringReader for this String and pass it back into the library as the Reader.

This works, however with large files (especially running in multiple threads), performance is an issue.

What I would like to do is have it read each line from the file at a time, replace in this substring, and then silently returned to the consumer of the Reader - but I can't think of how to do this.

Is there a better way to achieve this task?

I am using Java 7

An example of my current solution is below - reading from 'file', replacing all 'a's with 'b's and then passing the Stream to the consumer.

public void loadFile(final File file) throws Exception
{
    final Pattern regexPattern = Pattern.compile("a");
    final String replacementString = "b";

    try (BufferedReader cleanedBufferedReader = new BufferedReader(new StringReader(replaceInBufferedReader(new BufferedReader(new FileReader(file)),
            regexPattern, replacementString))))
    {
        new StreamSource(cleanedBufferedReader).doSomething();
    }
}

private static String replaceInBufferedReader(final BufferedReader reader, final Pattern pattern, final String replacement) throws IOException
{
    final StringBuilder builder = new StringBuilder();
    String str;

    while ((str = reader.readLine()) != null)
    {
        builder.append(str).append(System.lineSeparator());
    }

    return pattern.matcher(builder.toString()).replaceAll(replacement);
}

You just want to subclass BufferedReader.

class MyBufferedReader extends BufferedReader {

    MyBufferedReader(Reader r) {
        super(r);
    }

    @Override
    String readLine() {
        String line = super.readLine();
        // perform replacement here
        return line;
    }

}

Open your file as usual, but instead of wrapping it in a BufferedReader, wrap it in your subclass.

try ( Reader r = ...;
          BufferedReader br = new MyBufferedReader(r)) {
     String line;
     while ((line = br.readLine()) != null) {
         // use returned line
     }
}

Update

The following is a Reader which will allow you to do line-by-line replacements of an input stream, while still presenting a Reader interface to the user of the stream.

Internally, the original stream is wrapped in a BufferedReader , and read one line at a time. Any desired transformation may be performed on the lines which have been read. The transformed line is then turned into a StringReader . When the user of the stream calls any of the read(...) operations, the request is directed to the buffered StringReader to satisfy. If the StringReader runs out of characters, the next line of the BufferedReader is loaded and transformed, to continue to provide input for the read(...) .

abstract public class TranslatingReader extends Reader {

    private BufferedReader input;
    private StringReader output;

    public TranslatingReader(Reader in) {
        input = new BufferedReader(in);
        output = new StringReader("");
    }

    abstract public String translate(String line);

    @Override
    public int read(char[] cbuf, int off, int len) throws IOException {
        int read = 0;

        while (len > 0) {
            int nchars = output.read(cbuf, off, len);
            if (nchars == -1) {
                String line = input.readLine();
                if (line == null) {
                    break;
                }

                line = tranlate(line);

                line += "\n"; // Add the newline which was removed by readLine()
                output = new StringReader(line);
            } else {
                read += nchars;
                off += nchars;
                len -= nchars;
            }
        }

        if (read == 0)
            read = -1;

        return read;
    }

    @Override
    public void close() throws IOException {
        input.close();
        output.close();
    }
}

[edit] OP edited the question so this is no longer relevant

I expect that the file you have is not monolithic, since you're using a character reader Reader . If the data is not monolithic it must have some separators which splits the file into records. Usually these separators are newlines and/or carriage returns to form 'line of text' records.

Split your data into records according to the separators, and pass each record through the regex. In case of text lines, you may be able to use BufferedReader.readLine()

Another idea without extra overriding would be to use Scanner with your pattern as custom delimiter. This won't read the whole file at once, but only the part up to the given pattern on each iteration. Very memory effective. Could be something like that (you can enhance it to your needs):

PS about #performance: I think this approach could even be more performant than the blind reading line by line! Some cases for instance:

  • There's no subtitions in multiple lines and still reading them in!
  • The text file has been (oddly) saved as a large single line! (Without \\n s. This is possible by a bad export to file or during information retrieval)

Feel free to take a look at this alternative solution ↓

    private static String replaceInBufferedReader(String pathToFile){

    File some = new File("some.txt");
    StringBuilder sb = new StringBuilder();
    String replacementString = "b";
    String delimiter = "x";    // you can use pattern or regex

    try {
        // set Scanner's delimiter to the pattern you wanna replace 
        Scanner sc = new Scanner(some).useDelimiter(delimiter);        

        while (sc.hasNext()) {
            sb.append(sc.next()).append(replacementString);
        }
        sc.close();
    }
    catch (FileNotFoundException e) {
        e.printStackTrace();
    }
    return sb.toString();  // or maybe save to new file
}

I tested it with a 8MB text file and it was a piece of cake for it. I used a Writer to save it back as a new file instead returning the sb.toString()

...
try {
    Files.write(Paths.get("some2.txt"),
            sb.toString().getBytes(),
            StandardOpenOption.CREATE);
    }
    catch (IOException e) {
        e.printStackTrace();
    }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM