需要找到從文本文件中刪除重復項，比較每行的第一和第五個字符串

Question

作為我正在做的項目的一部分，我想清理我生成的重復行條目的文件。 但是，這些重復項通常不會彼此靠近。 我想出了一種在Java中執行此操作的方法（基本上是在文件中找到重復項，我在兩個arrayLists中存儲了兩個字符串並進行迭代，但由於嵌套的for循環而無法正常工作，因此它無法正常工作。

但是，我需要一個集成的解決方案。 最好用Java。 有任何想法嗎？ 項目清單

    public class duplicates {
        static BufferedReader reader = null;
        static BufferedWriter writer = null;
        static String currentLine;

        public static void main(String[] args) throws IOException {
            int count=0,linecount=0;;
            String fe = null,fie = null,pe=null;
            File file = new File("E:\\Book.txt");

            ArrayList<String> list1=new ArrayList<String>();
            ArrayList<String> list2=new ArrayList<String>();

            reader = new BufferedReader(new FileReader(file));

            while((currentLine = reader.readLine()) != null)
            {
                StringTokenizer st = new StringTokenizer(currentLine,"/");  //splits data into strings
                while (st.hasMoreElements()) {
                    count++;
                    fe=(String) st.nextElement();
                    //System.out.print(fe+"/// ");

                    //System.out.println("count="+count);
                    if(count==1){                                            //stores 1st string 
                        pe=fe;
                        //  System.out.println("first element "+fe);
                    }
                    else if(count==5){
                        fie=fe;                                              //stores 5th string
                        //  System.out.println("fifth element "+fie);
                    }
                }
                count=0;

                if(linecount>0){
                    for(String s1:list1)
                    {
                        for(String s2:list2){
                            if(pe.equals(s1)&&fie.equals(s2)){                              //checking condition
                                System.out.println("duplicate found");
                                //System.out.println(s1+ "   "+s2);
                            }        
                        }
                    }
                }                     
                list1.add(pe);
                list2.add(fie);
                linecount++;
            }
        }
    }

i/p:

/book1/_cwc/B737/customer/Special_Reports/
/Airbook/_cwc/A330-200/customer/02_Watchlists/
/book1/_cwc/B737/customer/Special_Reports/
/jangeer/_cwc/Crj_200/customer/plots/
/Airbook/_cwc/A330-200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/06_Performance_Summaries/
/jangeer/_cwc/Crj_200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/01_Highlights/
/jangeer/_cwc/ERJ170/customer/01_Highlights/

o/p:

/book1/_cwc/B737/customer/Special_Reports/
/Airbook/_cwc/A330-200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/plots/
/jangeer/_cwc/Crj_200/customer/06_Performance_Summaries/
/jangeer/_cwc/Crj_200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/01_Highlights/

Answer 1

public static void removeDups() {
        String[] input = new String[] { //Lets say you read whole file in this string array
                "/book1/_cwc/B737/customer/Special_Reports/",
                "/Airbook/_cwc/A330-200/customer/02_Watchlists/",
                "/book1/_cwc/B737/customer/Special_Reports/",
                "/jangeer/_cwc/Crj_200/customer/plots/",
                "/Airbook/_cwc/A330-200/customer/02_Watchlists/",
                "/jangeer/_cwc/Crj_200/customer/06_Performance_Summaries/",
                "/jangeer/_cwc/Crj_200/customer/02_Watchlists/",
                "/jangeer/_cwc/Crj_200/customer/01_Highlights/",
                "/jangeer/_cwc/ERJ170/customer/01_Highlights/"
        };
        ArrayList<String> outPut = new ArrayList<>(); //The array list for storing output i.e. distincts.
        Arrays.stream(input).distinct().forEach(x -> outPut.add(x)); //using java 8 and stream you get distinct from input
        outPut.forEach(System.out::println); //I will write back to the file, just for example I am printing out everything but you can write back the output to file using your own implementation.
    }

我運行此方法時的輸出是

/book1/_cwc/B737/customer/Special_Reports/
/Airbook/_cwc/A330-200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/plots/
/jangeer/_cwc/Crj_200/customer/06_Performance_Summaries/
/jangeer/_cwc/Crj_200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/01_Highlights/
/jangeer/_cwc/ERJ170/customer/01_Highlights/

編輯

非Java 8答案

public static void removeDups() {
        String[] input = new String[] {
                "/book1/_cwc/B737/customer/Special_Reports/",
                "/Airbook/_cwc/A330-200/customer/02_Watchlists/",
                "/book1/_cwc/B737/customer/Special_Reports/",
                "/jangeer/_cwc/Crj_200/customer/plots/",
                "/Airbook/_cwc/A330-200/customer/02_Watchlists/",
                "/jangeer/_cwc/Crj_200/customer/06_Performance_Summaries/",
                "/jangeer/_cwc/Crj_200/customer/02_Watchlists/",
                "/jangeer/_cwc/Crj_200/customer/01_Highlights/",
                "/jangeer/_cwc/ERJ170/customer/01_Highlights/"
        };

        LinkedHashSet<String> output = new LinkedHashSet<String>(Arrays.asList(input)); //output is your set of unique strings in preserved order

    }

Answer 2

使用Set<String>而不是Arraylist<String> 。

集合中不允許重復，因此，如果僅向其中添加所有行，然后將其取出，則將具有所有不同的字符串。

在性能方面，它比嵌套的for循環還快。

需要找到從文本文件中刪除重復項，比較每行的第一和第五個字符串

問題描述

2 個解決方案

解決方案1
1 2015-11-01 15:31:37

解決方案2
1 已采納 2015-11-01 16:23:22

需要找到從文本文件中刪除重復項，比較每行的第一和第五個字符串

問題描述

2 個解決方案

解決方案1 1 2015-11-01 15:31:37

解決方案2 1 已采納 2015-11-01 16:23:22

解決方案1
1 2015-11-01 15:31:37

解決方案2
1 已采納 2015-11-01 16:23:22