Java从文件搜索中删除字符串数组[0]的重复项

Question

I have a long text file. 我的文本文件很长。

Now I will remove duplicates from the file. 现在，我将从文件中删除重复项。 The problem is that the search parameter is the first word in the list, split by ":" 问题是搜索参数是列表中的第一个单词，并用“：”分隔

For example: 例如：

The file lines: 文件行：

11234567:229283:29833204:2394803
11234567:4577546765:655776:564456456
43523:455543:54335434:53445
11234567:43455:544354:5443

Now I will have this here: 现在我将在这里：

11234567:229283:29833204:2394803
43523:455543:54335434:53445

I need to get the first line from the duplicates, other will be ignored. 我需要从重复项中获取第一行，其他将被忽略。

I tried this: 我尝试了这个：

Set<String> lines11;
try (BufferedReader reader11 = new BufferedReader(new FileReader("test.txt"))) {
    lines11 = new HashSet<>(10000); // maybe should be bigger
    String line11;
    while ((line11 = reader11.readLine()) != null) {
        lines11.add(line11);
    }
} // maybe should be bigger
try (BufferedWriter writer11 = new BufferedWriter(new FileWriter("test.txt"))) {
    for (String unique : lines11) {
        writer11.write(unique);
        writer11.newLine();
    }
}

That is working, but it removes only when the complete line is duplicated. 那是可行的，但是仅当复制完整行时才删除。

How can I change it so that it looks for the first word in every line and checks for duplicates here; 我该如何更改它，以便它在每一行中查找第一个单词并在此处检查重复项； when no duplicate is found, save the complete line; 如果没有发现重复，则保存完整的行； if duplicate then ignore the line? 如果重复则忽略该行？

Answer 1

You need to maintain a Set<String> that holds only the first word of each line. 您需要维护一个Set<String> ，仅保存每行的第一个单词。

List<String> lines11;
Set<String> dups;
try (BufferedReader reader11 = new BufferedReader(new FileReader("test.txt"))) {
    lines11 = new ArrayList<>();
    dups = new HashSet<>();
    String line11;
    while ((line11 = reader11.readLine()) != null) {
        String first = line11.split(":")[0]; // assuming your separator is :
        if (!dups.contains(first)) {
            lines11.add(line11);
            dups.add(first);
        }
    }
}
try (BufferedWriter writer11 = new BufferedWriter(new FileWriter("test.txt"))) {
    for (String unique : lines11) {
        writer11.write(unique);
        writer11.newLine();
    }
}

Answer 2

i will write the section about adding to list use HashMap 我将撰写有关使用HashMap添加到列表的部分

    String tmp[] = null;
    HashMap<String, String> lines = new HashMap<String, String>();
    String line11 = "";

    while ((line11 = reader11.readLine()) != null) {
        tmp = line11.split(":");
        if(!lines.containsKey(tmp[0])){
            lines.put(tmp[0], line11);
        }
    }

so the loop will add only uinuque lines , using first word as key 因此循环将只添加第一个单词作为键的唯一行

Answer 3

    You can add the data in list and take one more set in which you will add first word in that set and try add every time first of new line if it is in set, then it will not be added and return false. On that basis you can add data in list or directly in you new bufferreader.


List<String> lines11;
     Set<String> uniqueRecords;
                try (BufferedReader reader11 = new BufferedReader(new FileReader("test.txt"))) {
                    lines11 = new ArrayList<>(); // no need to give size it will increase dynamically
    uniqueRecords = new HashSet<>();
                    String line11;
                    while ((line11 = reader11.readLine()) != null) {
                           String firstWord = line11.substring(0, firstWord.firstIndexOf(" "));
                           if(uniqueRecords.add(firstWord )){
                               lines11.add(line11);
                                  }



                    }
                } // maybe should be bigger
                try (BufferedWriter writer11 = new BufferedWriter(new FileWriter("test.txt"))) {
                    for (String unique : lines11) {
                        writer11.write(unique);
                        writer11.newLine();

                    }
                }

Java从文件搜索中删除字符串数组[0]的重复项

问题描述

3 个解决方案

解决方案1
0 已采纳 2015-04-06 14:52:23

解决方案2
0 2015-04-06 14:53:56

解决方案3
0 2015-04-06 14:54:44

Java从文件搜索中删除字符串数组[0]的重复项

问题描述

3 个解决方案

解决方案1 0 已采纳 2015-04-06 14:52:23

解决方案2 0 2015-04-06 14:53:56

解决方案3 0 2015-04-06 14:54:44

解决方案1
0 已采纳 2015-04-06 14:52:23

解决方案2
0 2015-04-06 14:53:56

解决方案3
0 2015-04-06 14:54:44