简体   繁体   English

使用 java 从一个大文件读取并写入多个文件

[英]Read from a large file and write to multiple files with java

I have an A.txt file of 100,000,000 records from 1 to 100000000, each record is one line.我有一个 A.txt 文件,有 100,000,000 条记录,从 1 到 100000000,每条记录一行。 I have to read file A then write to file B and C, provided that even line writes to file B and the odd line writes to file C. Required read and write time must be less than 40 seconds.我要读文件A,然后写文件B和C,前提是偶数行写文件B,奇数行写文件C。要求读写时间必须小于40秒。 Below is the code that I already have but the runtime takes more than 50 seconds.下面是我已有的代码,但运行时间超过 50 秒。 Does anyone have any other solution to reduce runtime?有没有人有任何其他解决方案来减少运行时间?

Threading.java线程.java

import java.io.*;
import java.util.concurrent.LinkedBlockingQueue;

public class Threading implements Runnable {
    LinkedBlockingQueue<String> queue = new LinkedBlockingQueue<>();
    String file;
    Boolean stop = false;
    
    public Threading(String file) {
        this.file = file;
    }

    public void addQueue(String row) {
        queue.add();
    }
    
    public void Stop() {
        stop = true;
    }
    
    public void run() {
        try {
            BufferedWriter bw = new BufferedWriter(new FileWriter(file));
            while(!stop) {
                try {
                    String rơ = queue.take();
                    bw.while(row + "\n");
                } catch (Exception e) {
                    e.printStackTrace();
                }
            }
            bw.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

ThreadCreate.java线程创建.java

// I used 2 threads to write to 2 files B and C // 我使用 2 个线程写入 2 个文件 B 和 C

import java.io.*;
import java.util.List;

public class ThreadCreate {
    public void startThread(File file) {
        Threading t1 = new Threading("B.txt");
        Threading t1 = new Threading("B.txt");
        Thread td1 = new Thread(t1);
        Thread td1 = new Thread(t1);
        td1.start();
        td2.start();
        
        try {
            BufferedReader br = new BufferedReader(new FileReader(file));
            String line;
            long start = System.currentTimeMillis();
            while ((line = br.readLine()) != null) {
                if (Integer.parseInt(line) % 2 == 0) {
                    t1.addQueue(line);
                } else {
                    t2.addQueue(line);
                }
            }
            t1.Stop();
            t2.Stop();
            br.close();
            long end = System.currentTimeMillis();
            System.out.println("Time to read file A and write file B, C: " + ((end - start)/1000) + "s");
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Main.java主要.java

import java.io.*;

public class Main {
    public static void main(String[] args) throws IOException {
        File file = new File("A.txt");
        
        //Write file B, C
        ThreadCreate t = new ThreadCreate();
        t.startThread(file);
    }
}

Why are you making threads?你为什么要制作线程? That just slows things down.那只会减慢速度。 Threads are useful if the bottleneck is either the calculation itself or the blocking nature of the operation, and they only hurt if it is not.如果瓶颈是计算本身或操作的阻塞性质,则线程很有用,如果不是,它们只会造成伤害。 Here, it isn't: The CPU is just idling (the bottleneck will be the disk), and the nature of what it is blocking on means that multithreading does not help either: Telling a single SSD to write 2 boatloads of bytes in parallel is probably no faster (only slower, as it needs to bounce back and forth).在这里,它不是:CPU 只是空闲(瓶颈将是磁盘),它阻塞的性质意味着多线程也无济于事:告诉单个 SSD 并行写入 2 船字节可能不会更快(只会更慢,因为它需要来回反弹)。 If the target disk is a spinning disk, it is way slower - the write head cannot make clones of itself to go any faster, and by making it multithreaded, you are wasting a ton of time by asking the write head to bounce back and forth between the different write locations.如果目标磁盘是一个旋转磁盘,它慢得多 - 写头不能更快地将自己克隆到 go,并且通过使其成为多线程,你会浪费大量时间来要求写头来回反弹在不同的写入位置之间。

There's nothing that immediately strikes me as ripe for significant speedups.没有什么可以立即让我觉得可以显着加速的了。

Sometimes, writing a ton of data to a disk just takes 50 seconds.有时,将大量数据写入磁盘仅需 50 秒。 If that's not acceptable, buy a faster disk.如果不能接受,请购买更快的磁盘。

try memory mapped files尝试 memory 映射文件

   byte[] buffer = "foo bar foo bar text\n".getBytes();
int number_of_lines = 100000000;

FileChannel file = new RandomAccessFile("writeFIle.txt", "rw").getChannel();
ByteBuffer wrBuf = file.map(FileChannel.MapMode.READ_WRITE, 0, buffer.length * number_of_lines);
for (int i = 0; i < number_of_lines; i++)
{
    wrBuf.put(buffer);
}
file.close();

Took to my computer (Dell, I7 processor, with SSD, 32GB RAM) a little over half a minute to run this code)在我的电脑上(戴尔,I7 处理器,带 SSD,32GB RAM)运行这段代码需要半分钟多一点的时间)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM