As part of a project I'm working on, I'd like to clean up a file I generate of duplicate line entries. These duplicates often won't occur near each other, however. I came up with a method of doing so in Java (which basically find a duplicates in the file, I stored two strings in two arrayLists and iterating but it was not working because of nested for loops i am getting into the condition manyways.
I need an integrated solution for this, however. Preferably in Java. Any ideas? List item
public class duplicates {
static BufferedReader reader = null;
static BufferedWriter writer = null;
static String currentLine;
public static void main(String[] args) throws IOException {
int count=0,linecount=0;;
String fe = null,fie = null,pe=null;
File file = new File("E:\\Book.txt");
ArrayList<String> list1=new ArrayList<String>();
ArrayList<String> list2=new ArrayList<String>();
reader = new BufferedReader(new FileReader(file));
while((currentLine = reader.readLine()) != null)
{
StringTokenizer st = new StringTokenizer(currentLine,"/"); //splits data into strings
while (st.hasMoreElements()) {
count++;
fe=(String) st.nextElement();
//System.out.print(fe+"/// ");
//System.out.println("count="+count);
if(count==1){ //stores 1st string
pe=fe;
// System.out.println("first element "+fe);
}
else if(count==5){
fie=fe; //stores 5th string
// System.out.println("fifth element "+fie);
}
}
count=0;
if(linecount>0){
for(String s1:list1)
{
for(String s2:list2){
if(pe.equals(s1)&&fie.equals(s2)){ //checking condition
System.out.println("duplicate found");
//System.out.println(s1+ " "+s2);
}
}
}
}
list1.add(pe);
list2.add(fie);
linecount++;
}
}
}
i/p:
/book1/_cwc/B737/customer/Special_Reports/
/Airbook/_cwc/A330-200/customer/02_Watchlists/
/book1/_cwc/B737/customer/Special_Reports/
/jangeer/_cwc/Crj_200/customer/plots/
/Airbook/_cwc/A330-200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/06_Performance_Summaries/
/jangeer/_cwc/Crj_200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/01_Highlights/
/jangeer/_cwc/ERJ170/customer/01_Highlights/
o/p:
/book1/_cwc/B737/customer/Special_Reports/
/Airbook/_cwc/A330-200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/plots/
/jangeer/_cwc/Crj_200/customer/06_Performance_Summaries/
/jangeer/_cwc/Crj_200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/01_Highlights/
public static void removeDups() {
String[] input = new String[] { //Lets say you read whole file in this string array
"/book1/_cwc/B737/customer/Special_Reports/",
"/Airbook/_cwc/A330-200/customer/02_Watchlists/",
"/book1/_cwc/B737/customer/Special_Reports/",
"/jangeer/_cwc/Crj_200/customer/plots/",
"/Airbook/_cwc/A330-200/customer/02_Watchlists/",
"/jangeer/_cwc/Crj_200/customer/06_Performance_Summaries/",
"/jangeer/_cwc/Crj_200/customer/02_Watchlists/",
"/jangeer/_cwc/Crj_200/customer/01_Highlights/",
"/jangeer/_cwc/ERJ170/customer/01_Highlights/"
};
ArrayList<String> outPut = new ArrayList<>(); //The array list for storing output i.e. distincts.
Arrays.stream(input).distinct().forEach(x -> outPut.add(x)); //using java 8 and stream you get distinct from input
outPut.forEach(System.out::println); //I will write back to the file, just for example I am printing out everything but you can write back the output to file using your own implementation.
}
The output when I ran this method was
/book1/_cwc/B737/customer/Special_Reports/
/Airbook/_cwc/A330-200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/plots/
/jangeer/_cwc/Crj_200/customer/06_Performance_Summaries/
/jangeer/_cwc/Crj_200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/01_Highlights/
/jangeer/_cwc/ERJ170/customer/01_Highlights/
EDIT
Non Java 8 answer
public static void removeDups() {
String[] input = new String[] {
"/book1/_cwc/B737/customer/Special_Reports/",
"/Airbook/_cwc/A330-200/customer/02_Watchlists/",
"/book1/_cwc/B737/customer/Special_Reports/",
"/jangeer/_cwc/Crj_200/customer/plots/",
"/Airbook/_cwc/A330-200/customer/02_Watchlists/",
"/jangeer/_cwc/Crj_200/customer/06_Performance_Summaries/",
"/jangeer/_cwc/Crj_200/customer/02_Watchlists/",
"/jangeer/_cwc/Crj_200/customer/01_Highlights/",
"/jangeer/_cwc/ERJ170/customer/01_Highlights/"
};
LinkedHashSet<String> output = new LinkedHashSet<String>(Arrays.asList(input)); //output is your set of unique strings in preserved order
}
Use a Set<String>
instead of Arraylist<String>
.
Duplicates aren't allowed in a Set, so if you just add everyline to it, then get them back out, you'll have all distinct strings.
Performance-wise it's also quicker than your nested for-loop.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.