简体   繁体   English

使用Java将.txt文件转换为数组

[英].txt file to arrays using Java

I have a .txt file containing document information (For 1400 documents). 我有一个包含文档信息的.txt文件(适用于1400个文档)。 Each document has an ID, title, author, area and abstract. 每个文档都有ID,标题,作者,区域和摘要。 A sample looks like this: 示例如下所示:

.I 1
.T
experimental investigation of the aerodynamics of a
wing in a slipstream .
.A
brenckman,m.
.B
j. ae. scs. 25, 1958, 324.
.W
experimental investigation of the aerodynamics of a
wing in a slipstream .
  [...]
the specific configuration of the experiment .

I want to put each of these into 5 arrays dedicated to each category. 我想将其中的每一个放入专用于每个类别的5个阵列中。 I'm having trouble inserting the title and abstract into a single array position, can anyone tell me what's wrong with this code? 我在将标题和摘要插入单个数组位置时遇到问题,有人能告诉我这段代码有什么问题吗? What I am trying to do is insert the text lines into position x after a ".T" is read and stop when it finds a ".A", when it happens, increase position by 1 for it to fill the next position 我想要做的是在读取“.T”后将文本行插入位置x并在找到“.A”时停止,当它发生时,将位置增加1以使其填充下一个位置

try{
    collection = new File (File location);
    fr = new FileReader (collection);
    br = new BufferedReader(fr);
    String numDoc = " ";
    int pos = 0;
    while((numDoc=br.readLine())!=null){
        if(numDoc.contains(".T")){
            while((numDoc=br.readLine())!= null && !numDoc.contains(".A")){
                Title[pos] = Title[pos] + numDoc; 
                pos++;
           }

        }
    }
}
catch(Exception e){
     e.printStackTrace();
}

The goal is to have all the information within a single line of String. 目标是将所有信息都放在一行String中。 Any help would be greatly appreciated. 任何帮助将不胜感激。

A code walkthrough is always helpful. 代码演练总是有用的。 In the future, you can probably use breakpoints, but I think I know why you're getting what I assume is a Null Pointer Exception. 在将来,你可以使用断点,但我想我知道你为什么得到我认为是空指针异常。

while((numDoc=br.readLine())!=null){
    if(numDoc.contains(".T")){
        while((numDoc=br.readLine())!= null && !numDoc.contains(".A")){

Outside, everything looks good, In this loop is where the things start going bonkers. 在外面,一切看起来都不错,在这个循环中,事情开始变得疯狂。

            Title[pos] = Title[pos] + numDoc; 

With your provided input, we would set: 根据您提供的输入,我们将设置:

Title[0] as Title[0] + "experimental investigation of the aerodynamics of a" Title[0]Title[0] + "experimental investigation of the aerodynamics of a"

This works only if Title[0] exists, which I don't assume it has been initialized, yet. 这仅在Title [0]存在时有效,我不认为它已经初始化了。 We'll address that issue first by correctly detecting for a null array value. 我们首先通过正确检测空数组值来解决该问题。 This would either be a compiler error about something not being initialized or a run-time null pointer exception. 这可能是关于未初始化的事件的编译器错误或运行时空指针异常。 Off the top of my head, I want to say compiler error. 在我的头顶,我想说编译器错误。

So anyways, we'll address dealing with null Title[pos]. 所以无论如何,我们将解决处理null Title [pos]的问题。

while((numDoc=br.readLine())!=null){
    if(numDoc.contains(".T")){
        while((numDoc=br.readLine())!= null && !numDoc.contains(".A")){
            if(Title[pos] != null) {
                Title[pos] = Title[pos] + numDoc; 
            }
            else {
                Title[pos] = numDoc;
            }
            pos++;
       }
    }
}

When we do another walkthrough, we'll get the following array values 当我们进行另一次演练时,我们将获得以下数组值

Title[0]=experimental investigation of the aerodynamics of a 标题[0] = a的空气动力学实验研究

Title[1]=wing in a slipstream . 标题[1] =滑流中的翼。

If this intended, then this is fine. 如果这是有意的,那么这很好。 If you wanted the titles together, then you move the pos++ out the while loop. 如果你想要标题,那么你将pos++移出while循环。

while((numDoc=br.readLine())!=null){
    if(numDoc.contains(".T")){
        while((numDoc=br.readLine())!= null && !numDoc.contains(".A")){
            if(Title[pos] != null) {
                Title[pos] = Title[pos] + " " + numDoc; // add a space between lines
            }
            else {
                Title[pos] = numDoc;
            }
       }
       pos++;
    }
}

Then we get: 然后我们得到:

Title[0]=experimental investigation of the aerodynamics of a wing in a slipstream . 标题[0] =滑流中机翼的空气动力学实验研究。

You may want to trim your inputs, but this should cover both of the potential errors that I can see. 您可能希望修剪输入,但这应该涵盖我可以看到的两个潜在错误。

Seriously, seriously, seriously, use Objects . 认真,认真,认真地使用Objects Objects allow you to group similar data and when you're handling all these arrays, you really will confuse yourself. 对象允许您对类似的数据进行分组,当您处理所有这些数组时,您真的会感到困惑。 More importantly though, you'll confuse the next person who's going to work on your code. 更重要的是,你会混淆下一个要编写代码的人。

Example

public class Book {
    private String title;
    private String bookAbstract;

    public Book(String title, String bookAbstract) {
        this.title = title;
        this.bookAbstract = bookAbstract;
    }
}

I've guessed you're parsing books, so I've created a Book class. 我猜你正在解析书籍,所以我创建了一个Book类。 Conceptually, this will contain everything to do with books. 从概念上讲,这将包含与书籍有关的所有内容。 I've added a title field for the title of the book and an abstract which, as you've guessed, is the book's abstract. 我已经为这本书的标题添加了一个title字段和一个abstract ,正如你所猜测的那样,这是该书的摘要。 This makes your code conceptually much easier to consume, but also much more maintainable. 这使您的代码在概念上更容易使用,但也更易于维护。 It also makes your goal very simple. 它也使你的目标非常简单。

The goal is to have all the information within a single line of String 目标是将所有信息都放在一行String中

Parse it and you can use the toString method: 解析它,你可以使用toString方法:

public String toString() {
    return "Title=" + title + "| Abstract=" + abstract;
}

Your specific issue 你的具体问题

What you're doing is reading up to a line with .T . 你正在做的是读取.T的一行。 Once you hit that line, you know that when a line contains .A , you've got data that you want to use. 一旦你点击那一行,就会知道当一行包含.A ,你就拥有了你想要使用的数据。 So, if you read the String Docs, you'll see that there is an indexOf method: 因此,如果您阅读String Docs,您会看到有一个indexOf方法:

indexOf(int ch, int fromIndex) indexOf(int ch,int fromIndex)

Returns the index within this string of the first occurrence of the specified character, starting the search at the specified index. 返回指定字符第一次出现的此字符串中的索引,从指定索引处开始搜索。

That fromIndex value is important here. 那个fromIndex值在这里很重要。 You know what you're looking for ( .A ) and you know where you're starting from ( .T ). 你知道你在找什么( .A ),你知道你从哪里开始( .T )。 Using this information, you can jump through the string, dissecting out the useful bits and pass it into your new Book object for parsing. 使用此信息,您可以跳过字符串,分析出有用的位并将其传递到新的Book对象中进行解析。

Because you increment pos each time you add a non-.A line, those lines will not go into the same element of Title . 因为每次添加非.A行时都会增加pos ,所以这些行不会进入Title的同一元素。 I think you want to wait to increment pos until you've read the .A-line. 我想你要等到增加pos直到你读完.A线。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM