简体   繁体   English

Java:同时写入和读取文件

[英]Java: Write and Read a file simultaneously

This is actually a design question / problem. 这实际上是一个设计问题。 And I am not sure if writing and reading the file is an ideal solution here. 而且我不确定在这里读写文件是否是理想的解决方案。 Nonetheless, I will outline what I am trying to do below: I have the following static method that once the reqStreamingData method of obj is called, it starts retrieving data from client server constantly at a rate of 150 milliseconds. 尽管如此,我将在下面概述我要执行的操作:我有以下静态方法,一旦调用objreqStreamingData方法,它将开始以150毫秒的速率不断地从客户端服务器检索数据。

    public static void streamingDataOperations(ClientSocket cs) throws InterruptedException, IOException{
        // call - retrieve streaming data constantly from client server, 
        // and write a line in the csv file at a rate of 150 milliseconds
        // using bufferedWriter and printWriter (print method).
        // Note that the flush method of bufferedWriter is never called,
        // I would assume the data is in fact being written in buffered memory
        // not the actual file. 
       cs.reqStreamingData(output_file); // <- this method comes from client's API.

       // I would like to another thread (aka data processing thread) which repeats itself every 15 minutes.
       // I am aware I can do that by creating a class that extends TimeTask and fix a schedule
       // Now when this thread runs, there are things I want to do. 
       // 1. flush last 15 minutes of data to the output_file (Note no synchronized statement method or statements are used here, hence no object is being locked.)
       // 2. process the data in R
       // 3. wait for the output in R to come back 
       // 4. clear file contents, so that it always store data that only occurs in the last 15 minutes
    }

Now, I am not well versed in multithreading. 现在,我不太熟悉多线程。 My concern is that 我担心的是

  1. The request data thread and the data processing thread are reading and writing to the file simultaneously but at a different rate, I am not sure if the data processing thread would delay the request data thread by a significant amount, since the data processing have more computational heavy task to carry out than the request data thread. 请求数据线程和数据处理线程正在同时读写文件,但是速率不同,我不确定数据处理线程是否会大大延迟请求数据线程,因为数据处理的计算量更大比请求数据线程要执行的繁重任务。 But given that they are 2 separate threads, would any error or exception occur here ? 但是,鉴于它们是2个单独的线程,在这里会发生任何错误或异常吗?
  2. I am not too supportive of the idea of writing and reading the same file at the same time but because I have to use R to process and store the data in R's dataframe in real time, I really cannot think of other ways to approach this. 我不太支持同时写入和读取同一文件的想法,但是因为我必须使用R来实时处理并将数据存储在R的数据帧中,所以我真的想不出其他方法来实现此目的。 Are there any better alternatives ? 有更好的选择吗?
  3. Is there a better design to tackle this problem ? 有没有更好的设计来解决这个问题?

I understand that this is a lengthy problem. 我知道这是一个漫长的问题。 Please let me know if you need more information. 如果您需要更多信息,请告诉我。

The lines (CSV, or any other text) can be written to a temporary file. 可以将这些行(CSV或任何其他文本)写入一个临时文件。 When processing is ready to pick up, the only synchronization needed occurs when the temporary file is getting replaced by the new one. 当准备好处理时,仅在临时文件被新文件替换时,才需要进行同步。 This guarantees that the producer never writes to the file that is being processed by the consumer at the same time. 这样可以确保生产者永远不会同时写入消费者正在处理的文件。

Once that is done, producer continues to add lines to the newer file. 一旦完成,生产者将继续向新文件添加行。 The consumer flushes and closes the old file, and then moves it to the file as expected by your R-application. 使用者将刷新并关闭旧文件,然后按照R应用程序的预期将其移动到文件中。

To further clarify the approach, here is a sample implementation: 为了进一步阐明该方法,下面是一个示例实现:

public static void main(String[] args) throws IOException {
    // in this sample these dirs are supposed to exist
    final String workingDirectory = "./data/tmp";
    final String outputDirectory = "./data/csv";

    final String outputFilename = "r.out";
    final int addIntervalSeconds = 1;
    final int drainIntervalSeconds = 5;

    final FileBasedTextBatch batch = new FileBasedTextBatch(Paths.get(workingDirectory));
    final ScheduledExecutorService executor = Executors.newScheduledThreadPool(1);

    final ScheduledFuture<?> producer = executor.scheduleAtFixedRate(
        () -> batch.add(
            // adding formatted date/time to imitate another CSV line
            LocalDateTime.now().format(DateTimeFormatter.ISO_DATE_TIME)
        ),
        0, addIntervalSeconds, TimeUnit.SECONDS);

    final ScheduledFuture<?> consumer = executor.scheduleAtFixedRate(
        () -> batch.drainTo(Paths.get(outputDirectory, outputFilename)),
        0, drainIntervalSeconds, TimeUnit.SECONDS);

    try {
        // awaiting some limited time for demonstration 
        producer.get(30, TimeUnit.SECONDS);
    }
    catch (InterruptedException e) {
        Thread.currentThread().interrupt();
    }
    catch (ExecutionException e) {
        System.err.println("Producer failed: " + e);
    }
    catch (TimeoutException e) {
        System.out.println("Finishing producer/consumer...");
        producer.cancel(true);
        consumer.cancel(true);
    }
    executor.shutdown();
}

static class FileBasedTextBatch {
    private final Object lock = new Object();
    private final Path workingDir;
    private Output output;

    public FileBasedTextBatch(Path workingDir) throws IOException {
        this.workingDir = workingDir;
        output = new Output(this.workingDir);
    }

    /**
     * Adds another line of text to the batch.
     */
    public void add(String textLine) {
        synchronized (lock) {
            output.writer.println(textLine);
        }
    }

    /**
     * Moves currently collected batch to the file at the specified path.
     * The file will be overwritten if exists.
     */
    public void drainTo(Path targetPath) {
        try {
            final long startNanos = System.nanoTime();
            final Output output = getAndSwapOutput();
            final long elapsedMillis =
                TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - startNanos);
            System.out.printf("Replaced the output in %d millis%n", elapsedMillis);
            output.close();
            Files.move(
                output.file,
                targetPath,
                StandardCopyOption.ATOMIC_MOVE,
                StandardCopyOption.REPLACE_EXISTING
            );
        }
        catch (IOException e) {
            System.err.println("Failed to drain: " + e);
            throw new IllegalStateException(e);
        }
    }

    /**
     * Replaces the current output with the new one, returning the old one.
     * The method is supposed to execute very quickly to avoid delaying the producer thread.
     */
    private Output getAndSwapOutput() throws IOException {
        synchronized (lock) {
            final Output prev = this.output;
            this.output = new Output(this.workingDir);
            return prev;
        }
    }
}

static class Output {
    final Path file;
    final PrintWriter writer;

    Output(Path workingDir) throws IOException {
        // performs very well on local filesystems when working directory is empty;
        // if too slow, maybe replaced with UUID based name generation
        this.file = Files.createTempFile(workingDir, "csv", ".tmp");
        this.writer = new PrintWriter(Files.newBufferedWriter(this.file));
    }

    void close() {
        if (this.writer != null)
            this.writer.flush();
            this.writer.close();
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM