如何從Java中的文本文件讀取格式化的數據

Question

因此，在過去的一周中，我已經完成了這項任務，而在這項任務中我要做的一件事就是從文本文件中讀取格式化的數據。 格式化的意思是這樣的：

{
    Marsha      1234     Florida   1268
    Jane        1523     Texas     4456
    Mark        7253     Georgia   1234
}

（注意：這只是一個例子。不是我分配的實際數據。）

現在，我一直在嘗試自己解決這個問題。 我嘗試將每一行讀為一個字符串，並使用.substring()獲取該字符串的某些部分，並將其放入數組中，然后從數組中獲取該字符串的索引，並將其打印到屏幕上。 現在，我嘗試了這種想法的幾種不同變體，但它沒有用。 它要么以錯誤結尾，要么以奇怪的方式輸出數據。 現在的任務是明天到期，我不知道該怎么辦。 如果有人可以在這個問題上為我提供一些幫助，將不勝感激。

Answer 1

對於您給出的示例，使用正則表達式模式\\s+分割行將起作用：

String s = "Marsha      1234     Florida   1268";
s.split("\\s+");

結果包含4個元素“ Marsha”，“ 1234”，“ Florida”和“ 1268”的數組。

我使用的模式匹配一個或多個空格字符-有關詳細信息和其他選項，請參見Pattern的JavaDocs 。

另一種方法是定義您的生產線需要整體匹配的模式，並捕獲您感興趣的組：

String s = "Marsha      1234     Florida   1268";

Pattern pattern = Pattern.compile("(\\w+)\\s+(\\d+)\\s+(\\w+)\\s+(\\d+)");
Matcher matcher = pattern.matcher(s);

if (!matcher.matches())
    throw new IllegalArgumentException("line does not match the expected pattern"); //or do whatever else is appropriate for your use case

String name = matcher.group(1);
String id = matcher.group(2);
String state = matcher.group(3);
String whatever = matcher.group(4);

此模式要求第二和第四組僅由數字組成。

但是請注意，如果您的數據也可以包含空格，則這兩種方法都會失效-在這種情況下，您需要使用不同的模式。

Answer 2

首先，您必須知道文件的格式。 就像您的示例一樣，它以{開頭，以}結尾。 什么是數據分隔符？ 例如，分隔符可以是分號，空格等。 知道了這一點，您就可以開始構建應用程序了。 在您的示例中，我將編寫如下內容：

public class MainClass
{

public static void main(String[] args)
{
    String s = "{\r\n"+
               "Marsha      1234     Florida   1268\r\n" + 
               "Jane        1523     Texas     4456\r\n" + 
               "Mark        7253     Georgia   1234\r\n"+
               "}\r\n";

    String[] rows = s.split("\r\n");

    //Here we will keep evertihing without the first and the last row
    List<String> importantRows = new ArrayList<>(rows.length-2);
    //lets assume that we do not need the first and the last row
    for(int i=0; i<rows.length; i++)
    {
        //String r = rows[i];
        //System.out.println(r);

        if(i>0 && i<rows.length)
        {
            importantRows.add(rows[i]);
        }

    }

    List<String> importantWords = new ArrayList<>(rows.length-2);
    //Now lets split every 'word' from row
    for(String rowImportantData : importantRows)
    {
        String[] oneRowData = rowImportantData.split(" ");

        //Here we will have one row like: [Marsha][ ][ ][ ][1234][ ][ ][ ][Florida][ ][ ][1268]
        // We need to remove the whitespace. This happen because there is more        
        //then one whitespace one after another. You can use some regex or another approach 
        // but I will show you this because you can have data that you do not need and you want to remove it.
        for(String data : oneRowData)
        {
            if(!data.trim().isEmpty())
            {
                importantWords.add(data);
            }
            //System.out.println(data);
        }

    }

    //Now we have the words.
    //You must know the rules that apply for this data. Let's assume from your example that you have (Name Number) group
    //If we want to print every group (Name Number) and we have in this state list with [Name][Number][Name][Number]....
    //Then we can print it this way
    for(int i=0; i<importantWords.size()-1; i=i+2)
    {
        System.out.println(importantWords.get(i) + " " + importantWords.get(i+1));
    }

}

}

這只是一個例子。 您可以通過許多不同的方式制作應用。 重要的部分是您要知道要處理的信息的初始狀態是什么，以及要獲得的結果是什么。

祝好運！

Answer 3

您可以使用許多不同的方法來讀取此格式化的文件。 我建議您首先從文本中提取相關數據作為字符串列表，然后將各行分成多個字段。 這是一個示例，說明如何使用您提供的數據樣本執行此操作：

import java.util.Arrays;
import java.util.List;
import java.util.stream.Collectors;

public class CustomTextReader {

    public static void main(String[] args) {
        String text =
                "Marsha      1234     Florida   1268\r\n" + 
                "Jane        1523     Texas     4456\r\n" + 
                "Mark        7253     Georgia   1234";

        //Extract the relevant data from the text as a list of arrays
        //  in which each array is a line, and each element is a field. 
        List<String[]> data = getData(text);
        //Just printing the results
        print(data);
    }

    private static List<String[]> getData(String text) {
        //1. Separate content into lines.
        return Arrays.stream(text.split("\r\n"))
                //2. Separate lines into fields.
                .map(s -> s.split("\\s{2,}"))
                .collect(Collectors.toList());
    }

    private static void print(List<String[]> data) {
        data.forEach(line -> {
            for(String field : line) {
                System.out.print(field + " | ");
            }
            System.out.println();
        });

    }
}

重要的是要知道從格式上對數據有什么期望。 如果您知道字段不包含空格，則可以在步驟2中使用" "或\\\\s{2,}作為拆分字符串的模式。但是，如果您認為數據可能包含帶有空格的字段（例如“北卡羅來納州”），最好使用另一個\\\\s{2,}這樣的正則表達式（這就是我在上面的示例中所做的事情）。 希望對您有幫助！

Answer 4

我確實相信@JoniVR的建議會很有幫助，您應該考慮對每行的列使用分隔符。 當前，您將無法解析復合數據，例如名字“ Mary Ann”。 同樣，由於您提供的樣本數據已經有4行，因此您應該有一個POJO，它將代表從文件中解析出的數據。 一個概念性的看起來像：

class MyPojo {

    private String name;
    private int postCode;
    private String state;
    private int cityId;

    public MyPojo(String name, int postCode, String state, int cityId) {
        this.name = name;
        this.postCode = postCode;
        this.state = state;
        this.cityId = cityId;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public int getPostCode() {
        return postCode;
    }

    public void setPostCode(int postCode) {
        this.postCode = postCode;
    }

    public String getState() {
        return state;
    }

    public void setState(String state) {
        this.state = state;
    }

    public int getCityId() {
        return cityId;
    }

    public void setCityId(int cityId) {
        this.cityId = cityId;
    }

    @Override
    public String toString() {
        return "MyPojo{" +
            "name='" + name + '\'' +
            ", postCode=" + postCode +
            ", state='" + state + '\'' +
            ", cityId=" + cityId +
            '}';
    }
}

然后，我想在驗證行之后希望遇到錯誤，因此考慮存儲某些類型的Error類是一個好主意（一個適當設計的類可以擴展Exception類？）。 為此目的，一個非常簡單的類是：

class InsertionError {
    private String message;
    private int lineNumber;

    public InsertionError(String message, int lineNumber) {
        this.message = message;
        this.lineNumber = lineNumber;
    }

    @Override
    public String toString() {
        return "Error at line " + lineNumber + " -> " + message;
    }
}

然后，解決方案本身應：
1.分割線。
2.標記每行中的列，然后解析/驗證它們。
3.以有用的Java表示形式收集列數據。

也許像：

private static final int HEADERS_COUNT = 4;
private static final int LINE_NUMBER_CURSOR = 0;

public static void main(String[] args) {
    String data =   "Marsha      1234     Florida   1268\n" +
                    "Jasmine     Texas    4456\n" +
                    "Jane        1523     Texas     4456\n" +
                    "Jasmine     Texas    2233      asd\n" +
                    "Mark        7253     Georgia   1234";

    int[] lineNumber = new int[1];

    List<InsertionError> errors = new ArrayList<>();

    List<MyPojo> insertedPojo = Arrays.stream(data.split("\n"))
        .map(x -> x.split("\\p{Blank}+"))
        .map(x -> {
            lineNumber[LINE_NUMBER_CURSOR]++;

            if (x.length == HEADERS_COUNT) {
                Integer postCode = null;
                Integer cityId = null;

                try {
                    postCode = Integer.valueOf(x[1]);
                } catch (NumberFormatException ignored) {
                    errors.add(new InsertionError("\"" + x[1] + "\" is not a numeric value.", lineNumber[LINE_NUMBER_CURSOR]));
                }

                try {
                    cityId = Integer.valueOf(x[3]);
                } catch (NumberFormatException ignored) {
                    errors.add(new InsertionError("\"" + x[3] + "\" is not a numeric value.", lineNumber[LINE_NUMBER_CURSOR]));
                }

                if (postCode != null && cityId != null) {
                    return new MyPojo(x[0], postCode, x[2], cityId);
                }
            } else {
                errors.add(new InsertionError("Columns count does not match headers count.", lineNumber[LINE_NUMBER_CURSOR]));
            }
            return null;
        })
        .filter(Objects::nonNull)
        .collect(Collectors.toList());

    errors.forEach(System.out::println);

    System.out.println("Number of successfully inserted Pojos is " + insertedPojo.size() + ". Respectively they are: ");

    insertedPojo.forEach(System.out::println);
}

，它打印：

第2行出現錯誤->列數與標題數不匹配。
第4行->“ Texas”的錯誤不是數字值。
第4行->“ asd”的錯誤不是數字值。
成功插入的Pojos數為3。分別是：
MyPojo {name ='Marsha'，postCode = 1234，state ='Florida'，cityId = 1268}
MyPojo {name ='Jane'，postCode = 1523，state ='Texas'，cityId = 4456}
MyPojo {name ='Mark'，postCode = 7253，state ='Georgia'，cityId = 1234}

如何從Java中的文本文件讀取格式化的數據

問題描述

4 個解決方案

解決方案1
1 2018-10-11 09:20:51

解決方案2
1 2018-10-11 09:28:31

解決方案3
0 2018-10-11 10:45:58

解決方案4
0 2018-10-11 10:48:23

如何從Java中的文本文件讀取格式化的數據

問題描述

4 個解決方案

解決方案1 1 2018-10-11 09:20:51

解決方案2 1 2018-10-11 09:28:31

解決方案3 0 2018-10-11 10:45:58

解決方案4 0 2018-10-11 10:48:23

解決方案1
1 2018-10-11 09:20:51

解決方案2
1 2018-10-11 09:28:31

解決方案3
0 2018-10-11 10:45:58

解決方案4
0 2018-10-11 10:48:23