简体   繁体   English

如何在Java文本文件中找到选项卡式行?

[英]How do I find tabbed lines in text file in Java?

I have text files that are laid out like the following. 我有如下所示的文本文件。

Product Name
    HP Compaq Elite 8300 CMT

(HP Compaq Elite 8300 CMT is on its own line and has one tab space in front of it) (HP Compaq Elite 8300 CMT独立运行,并且在其前面有一个制表符空间)

I am trying to find a way to read line by line and remove lines starting with the tab. 我试图找到一种方法来逐行读取并从选项卡开始删除行。 First I am turning the file into a string list: 首先,我将文件转换为字符串列表:

public static List<String> readFile2(File file) throws IOException {
    FileInputStream fis = new FileInputStream(file);
    List<String> list = new ArrayList<>();
    //Construct BufferedReader from InputStreamReader
    BufferedReader br = new BufferedReader(new InputStreamReader(fis));

    String line = null;
    while ((line = br.readLine()) != null) {
        list.add(br.readLine());
    }

    br.close();
    return list;
}

and I have tried many different statements in a loop when reading the list but the correct lines are not returned: 并且我在读取列表时在循环中尝试了许多不同的语句,但未返回正确的行:

for(int i=0; i<list.size(); i++)

    {
        if(list.get(i).indexOf("\u0009")>-1 || list.get(i).contains("\u0009") || list.get(i).indexOf((char)9)>-1 || list.get(i).startsWith(" ") || list.get(i).startsWith("\t"))
        {
        list.remove(i);
        }
    }

any suggestions? 有什么建议么? Thanks! 谢谢!

Java's String class has a startsWith method that allows you to test if the String starts with a given prefix or not. Java的String类具有startsWith方法,该方法使您可以测试String是否以给定前缀开头。 You can use this to identify lines that start with a tab character. 您可以使用它来标识以制表符开头的行。 By using this, you can test the line you just read out of the buffer and not add it to your list int he first place. 通过使用它,您可以测试刚从缓冲区读出的行,而不将其添加到列表的第一位。

String line = null;
while ((line = br.readLine()) != null) {
    if(!line.startsWith("\u0009")) {
        list.add(line);
    }
}

Other answers have suggested (better 1 ) alternative approaches that avoid putting the matched lines into the list in the first place. 其他答案建议(更好的1 )替代方法,这些方法首先避免将匹配的行放入列表中。

Here's an explanation of why your version doesn't work: 这是为什么您的版本不起作用的说明:

for (int i = 0; i < list.size(); i++) {
    if (/* match line */) {
        list.remove(i);
    }
}

The problem is that when you remove the ith list element, all elements at larger indexes get "renumbered"; 问题是,当删除第ith个列表元素时,位于较大索引处的所有元素都会被“重新编号”; eg list.get(i + 1) becomes list.get(i) and so on. 例如list.get(i + 1)变为list.get(i) ,依此类推。

But the next thing you do is to increment i . 但是,接下来要做的就是增加i So ... in effect ... when you remove an element, the next element doesn't get checked. 因此...实际上...删除元素时,不会检查下一个元素。

Here is a correct way to do it: 这是正确的方法:

int i = 0;
while (i < list.size()) {
    if (/* match line */) {
        list.remove(i);
    } else {
        i++;
    }
}

Note that you DON'T increment i if you removed the i th element. 请注意,如果删除第i个元素,则不要递增i


For the record, any one of those tests that you used was sufficient to match a line containing a TAB. 为了记录在案,您使用的任何一项测试都足以匹配包含TAB的行。 Writing the same test lots of different ways did not help. 用不同的方式编写相同的测试无济于事。 There is a lesson in that for you ... 有一个教训给你...


1 - It is simpler (less code), and also significantly more efficient in you are processing a large file. 1-它更简单(代码更少),并且在处理大文件时也显着提高了效率。 Removing an element from an arbitrary position in an ArrayList is an O(N) operation. ArrayList的任意位置删除元素是O(N)操作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM