简体   繁体   English

Java - Reader流中的动态字符串替换

[英]Java - Dynamic String replacement inside a Reader stream

I have a (text) file on disk, which I need to read into a library that takes a Reader object. 我在磁盘上有一个(文本)文件,我需要将其读入一个带有Reader对象的库中。

While reading this file, I want to perform a regex String replacement on the data. 在读取此文件时,我想对数据执行正则表达式字符串替换。

My current solution is to read the whole file into memory as one String, do the String replacement, and then create a StringReader for this String and pass it back into the library as the Reader. 我目前的解决方案是将整个文件作为一个String读入内存,执行String替换,然后为此String创建一个StringReader并将其作为Reader传递回库中。

This works, however with large files (especially running in multiple threads), performance is an issue. 这适用于大文件(特别是在多个线程中运行),性能是一个问题。

What I would like to do is have it read each line from the file at a time, replace in this substring, and then silently returned to the consumer of the Reader - but I can't think of how to do this. 我想做的是让它一次从文件中读取每一行,替换这个子串,然后默默地返回给Reader的消费者 - 但我想不出怎么做。

Is there a better way to achieve this task? 有没有更好的方法来完成这项任务?

I am using Java 7 我使用的是Java 7

An example of my current solution is below - reading from 'file', replacing all 'a's with 'b's and then passing the Stream to the consumer. 我当前解决方案的一个示例如下:从'file'读取,用'b'替换所有'a',然后将Stream传递给消费者。

public void loadFile(final File file) throws Exception
{
    final Pattern regexPattern = Pattern.compile("a");
    final String replacementString = "b";

    try (BufferedReader cleanedBufferedReader = new BufferedReader(new StringReader(replaceInBufferedReader(new BufferedReader(new FileReader(file)),
            regexPattern, replacementString))))
    {
        new StreamSource(cleanedBufferedReader).doSomething();
    }
}

private static String replaceInBufferedReader(final BufferedReader reader, final Pattern pattern, final String replacement) throws IOException
{
    final StringBuilder builder = new StringBuilder();
    String str;

    while ((str = reader.readLine()) != null)
    {
        builder.append(str).append(System.lineSeparator());
    }

    return pattern.matcher(builder.toString()).replaceAll(replacement);
}

You just want to subclass BufferedReader. 您只想将BufferedReader子类化。

class MyBufferedReader extends BufferedReader {

    MyBufferedReader(Reader r) {
        super(r);
    }

    @Override
    String readLine() {
        String line = super.readLine();
        // perform replacement here
        return line;
    }

}

Open your file as usual, but instead of wrapping it in a BufferedReader, wrap it in your subclass. 像往常一样打开文件,但不是将其包装在BufferedReader中,而是将其包装在子类中。

try ( Reader r = ...;
          BufferedReader br = new MyBufferedReader(r)) {
     String line;
     while ((line = br.readLine()) != null) {
         // use returned line
     }
}

Update 更新

The following is a Reader which will allow you to do line-by-line replacements of an input stream, while still presenting a Reader interface to the user of the stream. 以下是一个Reader ,它允许您逐行替换输入流,同时仍然向流的用户提供Reader接口。

Internally, the original stream is wrapped in a BufferedReader , and read one line at a time. 在内部,原始流包装在BufferedReader ,一次读取一行。 Any desired transformation may be performed on the lines which have been read. 可以对已经读取的行执行任何期望的变换。 The transformed line is then turned into a StringReader . 然后将转换后的行转换为StringReader When the user of the stream calls any of the read(...) operations, the request is directed to the buffered StringReader to satisfy. 当流的用户调用任何read(...)操作时,请求将被定向到缓冲的StringReader以满足。 If the StringReader runs out of characters, the next line of the BufferedReader is loaded and transformed, to continue to provide input for the read(...) . 如果StringReader用完了字符,则会加载并转换BufferedReader的下一行,以继续为read(...)提供输入。

abstract public class TranslatingReader extends Reader {

    private BufferedReader input;
    private StringReader output;

    public TranslatingReader(Reader in) {
        input = new BufferedReader(in);
        output = new StringReader("");
    }

    abstract public String translate(String line);

    @Override
    public int read(char[] cbuf, int off, int len) throws IOException {
        int read = 0;

        while (len > 0) {
            int nchars = output.read(cbuf, off, len);
            if (nchars == -1) {
                String line = input.readLine();
                if (line == null) {
                    break;
                }

                line = tranlate(line);

                line += "\n"; // Add the newline which was removed by readLine()
                output = new StringReader(line);
            } else {
                read += nchars;
                off += nchars;
                len -= nchars;
            }
        }

        if (read == 0)
            read = -1;

        return read;
    }

    @Override
    public void close() throws IOException {
        input.close();
        output.close();
    }
}

[edit] OP edited the question so this is no longer relevant [编辑] OP编辑了这个问题,因此不再相关

I expect that the file you have is not monolithic, since you're using a character reader Reader . 我希望你的文件不是单片的,因为你使用的是字符阅读 Reader 阅读 Reader If the data is not monolithic it must have some separators which splits the file into records. 如果数据不是单片的,那么它必须有一些分隔符将文件分成记录。 Usually these separators are newlines and/or carriage returns to form 'line of text' records. 通常这些分隔符是换行符和/或回车符以形成“文本行”记录。

Split your data into records according to the separators, and pass each record through the regex. 根据分隔符将数据拆分为记录,并通过正则表达式传递每条记录。 In case of text lines, you may be able to use BufferedReader.readLine() 如果是文本行,您可以使用BufferedReader.readLine()

Another idea without extra overriding would be to use Scanner with your pattern as custom delimiter. 没有额外覆盖的另一个想法是将Scanner与您的模式一起用作自定义分隔符。 This won't read the whole file at once, but only the part up to the given pattern on each iteration. 这不会立即读取整个文件,而是在每次迭代时只读取给定模式的部分。 Very memory effective. 非常记忆有效。 Could be something like that (you can enhance it to your needs): 可能是这样的(你可以根据自己的需要增强它):

PS about #performance: I think this approach could even be more performant than the blind reading line by line! PS关于#performance:我认为这种方法甚至可以比逐行盲读更高效 Some cases for instance: 有些情况例如:

  • There's no subtitions in multiple lines and still reading them in! 多行没有减法,仍在读取!
  • The text file has been (oddly) saved as a large single line! 文本文件已经(奇怪地)保存为一个大的单行! (Without \\n s. This is possible by a bad export to file or during information retrieval) (没有\\n s。这可能是由于错误导出到文件或在信息检索期间)

Feel free to take a look at this alternative solution ↓ 随意看看这个替代解决方案↓

    private static String replaceInBufferedReader(String pathToFile){

    File some = new File("some.txt");
    StringBuilder sb = new StringBuilder();
    String replacementString = "b";
    String delimiter = "x";    // you can use pattern or regex

    try {
        // set Scanner's delimiter to the pattern you wanna replace 
        Scanner sc = new Scanner(some).useDelimiter(delimiter);        

        while (sc.hasNext()) {
            sb.append(sc.next()).append(replacementString);
        }
        sc.close();
    }
    catch (FileNotFoundException e) {
        e.printStackTrace();
    }
    return sb.toString();  // or maybe save to new file
}

I tested it with a 8MB text file and it was a piece of cake for it. 我用一个8MB的文本文件测试它,这对它来说是件小事。 I used a Writer to save it back as a new file instead returning the sb.toString() 我使用Writer将其保存为新文件,而不是返回sb.toString()

...
try {
    Files.write(Paths.get("some2.txt"),
            sb.toString().getBytes(),
            StandardOpenOption.CREATE);
    }
    catch (IOException e) {
        e.printStackTrace();
    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM