需要找到从文本文件中删除重复项，比较每行的第一和第五个字符串

Question

作为我正在做的项目的一部分，我想清理我生成的重复行条目的文件。 但是，这些重复项通常不会彼此靠近。 我想出了一种在Java中执行此操作的方法（基本上是在文件中找到重复项，我在两个arrayLists中存储了两个字符串并进行迭代，但由于嵌套的for循环而无法正常工作，因此它无法正常工作。

但是，我需要一个集成的解决方案。 最好用Java。 有任何想法吗？ 项目清单

    public class duplicates {
        static BufferedReader reader = null;
        static BufferedWriter writer = null;
        static String currentLine;

        public static void main(String[] args) throws IOException {
            int count=0,linecount=0;;
            String fe = null,fie = null,pe=null;
            File file = new File("E:\\Book.txt");

            ArrayList<String> list1=new ArrayList<String>();
            ArrayList<String> list2=new ArrayList<String>();

            reader = new BufferedReader(new FileReader(file));

            while((currentLine = reader.readLine()) != null)
            {
                StringTokenizer st = new StringTokenizer(currentLine,"/");  //splits data into strings
                while (st.hasMoreElements()) {
                    count++;
                    fe=(String) st.nextElement();
                    //System.out.print(fe+"/// ");

                    //System.out.println("count="+count);
                    if(count==1){                                            //stores 1st string 
                        pe=fe;
                        //  System.out.println("first element "+fe);
                    }
                    else if(count==5){
                        fie=fe;                                              //stores 5th string
                        //  System.out.println("fifth element "+fie);
                    }
                }
                count=0;

                if(linecount>0){
                    for(String s1:list1)
                    {
                        for(String s2:list2){
                            if(pe.equals(s1)&&fie.equals(s2)){                              //checking condition
                                System.out.println("duplicate found");
                                //System.out.println(s1+ "   "+s2);
                            }        
                        }
                    }
                }                     
                list1.add(pe);
                list2.add(fie);
                linecount++;
            }
        }
    }

i/p:

/book1/_cwc/B737/customer/Special_Reports/
/Airbook/_cwc/A330-200/customer/02_Watchlists/
/book1/_cwc/B737/customer/Special_Reports/
/jangeer/_cwc/Crj_200/customer/plots/
/Airbook/_cwc/A330-200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/06_Performance_Summaries/
/jangeer/_cwc/Crj_200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/01_Highlights/
/jangeer/_cwc/ERJ170/customer/01_Highlights/

o/p:

/book1/_cwc/B737/customer/Special_Reports/
/Airbook/_cwc/A330-200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/plots/
/jangeer/_cwc/Crj_200/customer/06_Performance_Summaries/
/jangeer/_cwc/Crj_200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/01_Highlights/

Answer 1

public static void removeDups() {
        String[] input = new String[] { //Lets say you read whole file in this string array
                "/book1/_cwc/B737/customer/Special_Reports/",
                "/Airbook/_cwc/A330-200/customer/02_Watchlists/",
                "/book1/_cwc/B737/customer/Special_Reports/",
                "/jangeer/_cwc/Crj_200/customer/plots/",
                "/Airbook/_cwc/A330-200/customer/02_Watchlists/",
                "/jangeer/_cwc/Crj_200/customer/06_Performance_Summaries/",
                "/jangeer/_cwc/Crj_200/customer/02_Watchlists/",
                "/jangeer/_cwc/Crj_200/customer/01_Highlights/",
                "/jangeer/_cwc/ERJ170/customer/01_Highlights/"
        };
        ArrayList<String> outPut = new ArrayList<>(); //The array list for storing output i.e. distincts.
        Arrays.stream(input).distinct().forEach(x -> outPut.add(x)); //using java 8 and stream you get distinct from input
        outPut.forEach(System.out::println); //I will write back to the file, just for example I am printing out everything but you can write back the output to file using your own implementation.
    }

我运行此方法时的输出是

/book1/_cwc/B737/customer/Special_Reports/
/Airbook/_cwc/A330-200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/plots/
/jangeer/_cwc/Crj_200/customer/06_Performance_Summaries/
/jangeer/_cwc/Crj_200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/01_Highlights/
/jangeer/_cwc/ERJ170/customer/01_Highlights/

编辑

非Java 8答案

public static void removeDups() {
        String[] input = new String[] {
                "/book1/_cwc/B737/customer/Special_Reports/",
                "/Airbook/_cwc/A330-200/customer/02_Watchlists/",
                "/book1/_cwc/B737/customer/Special_Reports/",
                "/jangeer/_cwc/Crj_200/customer/plots/",
                "/Airbook/_cwc/A330-200/customer/02_Watchlists/",
                "/jangeer/_cwc/Crj_200/customer/06_Performance_Summaries/",
                "/jangeer/_cwc/Crj_200/customer/02_Watchlists/",
                "/jangeer/_cwc/Crj_200/customer/01_Highlights/",
                "/jangeer/_cwc/ERJ170/customer/01_Highlights/"
        };

        LinkedHashSet<String> output = new LinkedHashSet<String>(Arrays.asList(input)); //output is your set of unique strings in preserved order

    }

Answer 2

使用Set<String>而不是Arraylist<String> 。

集合中不允许重复，因此，如果仅向其中添加所有行，然后将其取出，则将具有所有不同的字符串。

在性能方面，它比嵌套的for循环还快。

需要找到从文本文件中删除重复项，比较每行的第一和第五个字符串

问题描述

2 个解决方案

解决方案1
1 2015-11-01 15:31:37

解决方案2
1 已采纳 2015-11-01 16:23:22

需要找到从文本文件中删除重复项，比较每行的第一和第五个字符串

问题描述

2 个解决方案

解决方案1 1 2015-11-01 15:31:37

解决方案2 1 已采纳 2015-11-01 16:23:22

解决方案1
1 2015-11-01 15:31:37

解决方案2
1 已采纳 2015-11-01 16:23:22