简体   繁体   中英

Read specific data from a .txt file JAVA

I have a problem. I'm trying to read a large .txt file, but I don't need every piece of data that's inside.

My .txt file looks something like this:

8000000 abcdefg hijklmn word word letter

I only need, let's say, the number and the first two text positions: "abcdefg" and "hijklmn" and write it to another file after that. I don't know how to read and write just the data that I need.

Here is my code so far:

    BufferedReader br = new BufferedReader(new FileReader("position2.txt"));
    BufferedWriter bw = new BufferedWriter(new FileWriter("position.txt"));
    String line;

    while ((line = br.readLine())!= null){
        if(line.isEmpty() || line.trim().equals("") || line.trim().equals("\n")){
            continue;
        }else{
            //bw.write(line + "\n");
            String[] data = line.split(" ");
            bw.write(data[0] + " " + data[1] + " " + data[2] + "\n");
        }

    }

    br.close();
    bw.close();

}

Can you give me some sugestions ? Thanks in advance

UPDATE: My .txt files are a bit weird. Using the code above works great when there is only one single " " between them. My files can have a \\t or more spaces, or a \\t and some spaces between the words. Ho can I proceed now ?

Depending on the complexity of you data, you have a few options.

If the lines are simple space-separated values like shown, the simplest is to split the text, and write the values you want to keep to the new file:

try (BufferedReader br = new BufferedReader(new FileReader("text.txt"));
     BufferedWriter bw = new BufferedWriter(new FileWriter("data.txt"))) {
    String line;
    while ((line = br.readLine()) != null) {
        String[] values = line.split(" ");
        if (values.length >= 3)
            bw.write(values[0] + ' ' + values[1] + ' ' + values[2] + '\n');
    }
}

If the values might be more complex, you could use a regular expression:

Pattern p = Pattern.compile("^(\\d+ \\w+ \\w+)");
try (BufferedReader br = new BufferedReader(new FileReader("text.txt"));
     BufferedWriter bw = new BufferedWriter(new FileWriter("data.txt"))) {
    String line;
    while ((line = br.readLine()) != null) {
        Matcher m = p.matcher(line);
        if (m.find())
            bw.write(m.group(1) + '\n');
    }
}

This ensures that first value is digits only, and second and third values are word-characters only ( az AZ _ 0-9 ).

else {
     String[] res = line.split(" ");
     bw.write(res[0] + " " + res[1] + " " + res[2] + "\n"); // the first three words...
}

Assuming all lines of your text file follow the structure you described then you could do this: Replace FILE_PATH with your actual file path.

public static void main(String[] args) {
    try {
        Scanner reader = new Scanner(new File("FILE_PATH/myfile.txt"));
        PrintWriter writer = new PrintWriter(new File("FILE_PATH/myfile2.txt"));
        while (reader.hasNextLine()) {
            String line = reader.nextLine();
            String[] tokens = line.split(" ");

            writer.println(tokens[0] + ", " + tokens[1] + ", " + tokens[2]);
        }
        writer.close();
        reader.close();
    } catch (FileNotFoundException ex) {
        System.out.println("Error: " + ex.getMessage());
    }
}

You'll get something like: word0, word1, word2

If your files are really huge (above 50-100 MB maybe GBs) and you are sure that the first word is a number and you need two words after that I would suggest you to read one line and iterate through that string. Stop when you find 3rd space.

String str = readLine();
int num_spaces = 0, cnt = 0;
String arr[] = new String[3];
while(num_spaces < 3){
    if(str.charAt(cnt) == ' '){
        num_space++;
    }
    else{
        arr[num_space] += str.charAt(cnt);
    }
}

If your data is couple of MB only or have a lot of numbers inside, no need to worry about iterating char by char. Just read line by line and split lines then check the words as it is mentioned

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM