简体   繁体   中英

How do I find tabbed lines in text file in Java?

I have text files that are laid out like the following.

Product Name
    HP Compaq Elite 8300 CMT

(HP Compaq Elite 8300 CMT is on its own line and has one tab space in front of it)

I am trying to find a way to read line by line and remove lines starting with the tab. First I am turning the file into a string list:

public static List<String> readFile2(File file) throws IOException {
    FileInputStream fis = new FileInputStream(file);
    List<String> list = new ArrayList<>();
    //Construct BufferedReader from InputStreamReader
    BufferedReader br = new BufferedReader(new InputStreamReader(fis));

    String line = null;
    while ((line = br.readLine()) != null) {
        list.add(br.readLine());
    }

    br.close();
    return list;
}

and I have tried many different statements in a loop when reading the list but the correct lines are not returned:

for(int i=0; i<list.size(); i++)

    {
        if(list.get(i).indexOf("\u0009")>-1 || list.get(i).contains("\u0009") || list.get(i).indexOf((char)9)>-1 || list.get(i).startsWith(" ") || list.get(i).startsWith("\t"))
        {
        list.remove(i);
        }
    }

any suggestions? Thanks!

Java's String class has a startsWith method that allows you to test if the String starts with a given prefix or not. You can use this to identify lines that start with a tab character. By using this, you can test the line you just read out of the buffer and not add it to your list int he first place.

String line = null;
while ((line = br.readLine()) != null) {
    if(!line.startsWith("\u0009")) {
        list.add(line);
    }
}

Other answers have suggested (better 1 ) alternative approaches that avoid putting the matched lines into the list in the first place.

Here's an explanation of why your version doesn't work:

for (int i = 0; i < list.size(); i++) {
    if (/* match line */) {
        list.remove(i);
    }
}

The problem is that when you remove the ith list element, all elements at larger indexes get "renumbered"; eg list.get(i + 1) becomes list.get(i) and so on.

But the next thing you do is to increment i . So ... in effect ... when you remove an element, the next element doesn't get checked.

Here is a correct way to do it:

int i = 0;
while (i < list.size()) {
    if (/* match line */) {
        list.remove(i);
    } else {
        i++;
    }
}

Note that you DON'T increment i if you removed the i th element.


For the record, any one of those tests that you used was sufficient to match a line containing a TAB. Writing the same test lots of different ways did not help. There is a lesson in that for you ...


1 - It is simpler (less code), and also significantly more efficient in you are processing a large file. Removing an element from an arbitrary position in an ArrayList is an O(N) operation.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM