简体   繁体   English

从 .txt 文件中读取特定数据 JAVA

[英]Read specific data from a .txt file JAVA

I have a problem.我有个问题。 I'm trying to read a large .txt file, but I don't need every piece of data that's inside.我正在尝试读取一个大的 .txt 文件,但我不需要里面的每一条数据。

My .txt file looks something like this:我的 .txt 文件看起来像这样:

8000000 abcdefg hijklmn word word letter 8000000 abcdefg hijklmn word word 字母

I only need, let's say, the number and the first two text positions: "abcdefg" and "hijklmn" and write it to another file after that.我只需要数字和前两个文本位置:“abcdefg”和“hijklmn”,然后将其写入另一个文件。 I don't know how to read and write just the data that I need.我不知道如何读写我需要的数据。

Here is my code so far:到目前为止,这是我的代码:

    BufferedReader br = new BufferedReader(new FileReader("position2.txt"));
    BufferedWriter bw = new BufferedWriter(new FileWriter("position.txt"));
    String line;

    while ((line = br.readLine())!= null){
        if(line.isEmpty() || line.trim().equals("") || line.trim().equals("\n")){
            continue;
        }else{
            //bw.write(line + "\n");
            String[] data = line.split(" ");
            bw.write(data[0] + " " + data[1] + " " + data[2] + "\n");
        }

    }

    br.close();
    bw.close();

}

Can you give me some sugestions ?你能给我一些建议吗? Thanks in advance提前致谢

UPDATE: My .txt files are a bit weird.更新:我的 .txt 文件有点奇怪。 Using the code above works great when there is only one single " " between them.当它们之间只有一个“”时,使用上面的代码效果很好。 My files can have a \\t or more spaces, or a \\t and some spaces between the words.我的文件可以有一个 \\t 或更多的空格,或者一个 \\t 和单词之间的一些空格。 Ho can I proceed now ?我现在可以继续吗?

Depending on the complexity of you data, you have a few options.根据数据的复杂性,您有几种选择。

If the lines are simple space-separated values like shown, the simplest is to split the text, and write the values you want to keep to the new file:如果这些行是简单的空格分隔值,如所示,最简单的方法是拆分文本,然后将要保留的值写入新文件:

try (BufferedReader br = new BufferedReader(new FileReader("text.txt"));
     BufferedWriter bw = new BufferedWriter(new FileWriter("data.txt"))) {
    String line;
    while ((line = br.readLine()) != null) {
        String[] values = line.split(" ");
        if (values.length >= 3)
            bw.write(values[0] + ' ' + values[1] + ' ' + values[2] + '\n');
    }
}

If the values might be more complex, you could use a regular expression:如果值可能更复杂,您可以使用正则表达式:

Pattern p = Pattern.compile("^(\\d+ \\w+ \\w+)");
try (BufferedReader br = new BufferedReader(new FileReader("text.txt"));
     BufferedWriter bw = new BufferedWriter(new FileWriter("data.txt"))) {
    String line;
    while ((line = br.readLine()) != null) {
        Matcher m = p.matcher(line);
        if (m.find())
            bw.write(m.group(1) + '\n');
    }
}

This ensures that first value is digits only, and second and third values are word-characters only ( az AZ _ 0-9 ).这确保第一个值仅是数字,第二个和第三个值仅是单词字符 ( az AZ _ 0-9 )。

else {
     String[] res = line.split(" ");
     bw.write(res[0] + " " + res[1] + " " + res[2] + "\n"); // the first three words...
}

Assuming all lines of your text file follow the structure you described then you could do this: Replace FILE_PATH with your actual file path.假设您的文本文件的所有行都遵循您描述的结构,那么您可以这样做:将 FILE_PATH 替换为您的实际文件路径。

public static void main(String[] args) {
    try {
        Scanner reader = new Scanner(new File("FILE_PATH/myfile.txt"));
        PrintWriter writer = new PrintWriter(new File("FILE_PATH/myfile2.txt"));
        while (reader.hasNextLine()) {
            String line = reader.nextLine();
            String[] tokens = line.split(" ");

            writer.println(tokens[0] + ", " + tokens[1] + ", " + tokens[2]);
        }
        writer.close();
        reader.close();
    } catch (FileNotFoundException ex) {
        System.out.println("Error: " + ex.getMessage());
    }
}

You'll get something like: word0, word1, word2你会得到类似的东西:word0, word1, word2

If your files are really huge (above 50-100 MB maybe GBs) and you are sure that the first word is a number and you need two words after that I would suggest you to read one line and iterate through that string.如果您的文件非常大(可能超过 50-100 MB,可能是 GB)并且您确定第一个单词是一个数字,并且之后需要两个单词,我建议您阅读一行并遍历该字符串。 Stop when you find 3rd space.当你找到第三个空间时停止。

String str = readLine();
int num_spaces = 0, cnt = 0;
String arr[] = new String[3];
while(num_spaces < 3){
    if(str.charAt(cnt) == ' '){
        num_space++;
    }
    else{
        arr[num_space] += str.charAt(cnt);
    }
}

If your data is couple of MB only or have a lot of numbers inside, no need to worry about iterating char by char.如果您的数据只有几 MB 或里面有很多数字,则无需担心逐字符迭代。 Just read line by line and split lines then check the words as it is mentioned只需read line by line and split lines then check the words提到read line by line and split lines then check the words

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM