简体   繁体   English

如何从 txt 文件中读取并根据 Java 中的数字和字符串分隔文本

[英]How to read from a txt file and and seperate the text based on numbers and strings in Java

The program is reading from a text file.该程序正在从文本文件中读取。 Each line of the text file starts with a number from -2 to 2. The number is the then followed by a sentence.文本文件的每一行都以一个从 -2 到 2 的数字开头。数字后面是一个句子。 Please see below for the first three lines of the txt file:请参阅下面的 txt 文件的前三行:

1 Campanella gets the tone just right -- funny in the middle of sad in the middle of hopeful .
-2 Nothing more than an amiable but unfocused bagatelle that plays like a loosely-connected string of acting-workshop exercises .
1 It 's a sharp movie about otherwise dull subjects .
1 ... it 's as comprehensible as any Dummies guide , something even non-techies can enjoy .
-1 -LRB- Green is -RRB- the comedy equivalent of Saddam Hussein , and I 'm just about ready to go to the U.N. and ask permission for a preemptive strike .

The only lines that should be read are the ones that have a number, one space and then text in that order.唯一应该阅读的行是具有数字、一个空格和按该顺序排列的文本的行。 The last two lines should not be considered because they have ... and - respectively before the text.不应考虑最后两行,因为它们在文本之前分别具有...- The first three sentences are fine however.不过前三句还不错。

I have a class called placeholder with the following fields:我有一个名为placeholder的 class 具有以下字段:

public class placeholder implements Comparable<placeholder> {
    protected int score;
    protected String text;

    public placeholder(int score, String text) {
        this.score = score;
        this.text = text;
    }
}

I would like a method called readFile to go by line by line and store every single line into a list called reviewsDB .我想要一个名为readFile的方法逐行到 go 并将每一行存储到一个名为reviewsDB的列表中。 Each object in the list will be of type placeholder and the number at the start of the line will by the score value and the following words will be the text value.列表中的每个 object 都将是placeholder类型,行首的数字将由score值表示,以下单词将是text值。 What code can I put in the following area to break each line up between the number and text?我可以在以下区域输入什么代码来分隔数字和文本之间的每一行?

    public static List<placeholder> readFile(String filename) {

        File movieReviews = new File("reviews.txt");

        try {

            Scanner scanner = new Scanner(movieReviews);
            scanner.nextLine();

            List<placeholder> reviewsDB = new ArrayList<placeholder>();

            while (scanner.hasNextLine()) {
                int sentenceScore = 0;
                String sentenceText = null;

                //code to separate the number and text in each line here
                placeholder newSentence = new placeholder(sentenceScore, sentenceText);

                reviewsDB.add(newSentence);
            }

            return reviewsDB;
        }

        catch (Exception e) {

            System.out.println("Something went wrong");

            return null;
        }

    }
  • Read the file into a stream using Files#lines使用Files#lines将文件读入 stream
  • Filter the lines which meet your criteria using regex "-?\\d\\s\\w+.*"使用正则表达式"-?\\d\\s\\w+.*"过滤符合您条件的行
  • Split each line into two parts using String#split using space as delimeter and limit the resulting array to a length of two line.split("\\s",2)使用 String#split 将每行分成两部分,使用空格作为分隔符,并将结果数组的长度限制为两line.split("\\s",2)
  • Collect the stream to a list of Placeholder objects将 stream 收集到Placeholder对象列表

Example code:示例代码:

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.Stream;

public class Example {

    public static void main(String[] args) {
        List<placeholder> list= readFile("path to your file");
        list.forEach(System.out::println);
    }
    public static List<placeholder> readFile(String filename) {
        List<Placeholder> reviewsDB = new ArrayList<>();
        try (Stream<String> content = Files.lines(Paths.get(filename))) {
            reviewsDB = content
                    .filter(line -> line.matches("-?\\d\\s\\w+.*"))
                    .map(line -> line.split("\\s",2))
                    .map(arr -> new placeholder(Integer.parseInt(arr[0]), arr[1]))
                    .collect(Collectors.toList());
        } catch (IOException ex) {
            ex.printStackTrace();
        }
        return reviewsDB;
    }
}

You can use regex.您可以使用正则表达式。 It's best to match patterns.最好匹配模式。 You may have n number of characters, positive-negative also.您可能有 n 个字符,也可能有正负。 You can add (-|+) if you have + also at the beginning.如果您在开头也有+ ,则可以添加(-|+)

Hope you don't have scientific notations.希望你没有科学记数法。

while (scanner.hasNextLine()) {
    int sentenceScore = 0;
    String sentenceText = null;
    String line = scanner.nextLine();
    Matcher m = p.matcher(line);
    if (m.matches()) {
        System.out.println(m.group(1));
        System.out.println(m.group(2));
    }
    // code to separate the number and text in each line here
    placeholder newSentence = new placeholder(sentenceScore, sentenceText);

    reviewsDB.add(newSentence);
}

I used the below regex我使用了下面的正则表达式

Pattern p = Pattern.compile("^(-?\\d+)(.*)");

- is optional - -? -是可选的 - -? meant this Then one or more digits - \d+意思是然后一位或多位数字 - \d+

Then the second group is any character after the first group - (.*)然后第二组是第一组之后的任何字符 - (.*)

You can play with your inputs here I tested your input here.你可以在这里玩你的输入我在这里测试了你的输入。

You can use Files.readAllLines(Path, Charset) to get a List of Strings representing the content of your file.您可以使用Files.readAllLines(Path, Charset)获取表示文件内容的字符串列表。 Then you can iterate through the list and use String.split(Regex, Limit) to split the string in parts.然后您可以遍历列表并使用String.split(Regex, Limit)将字符串分成几部分。 Then you can create a new Placeholder-Object from the parts.然后您可以从零件创建一个新的占位符对象。

See:看:

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM