简体   繁体   中英

Order files with MergeSort in c++

I'm triying to implement my own MergeSort, but I've got some problems, see if anyone can help me a little.

I have a big file with some info separeted with coma (Name,city,mail,telf). I would like to apply mergesort to order it, because I supose that the client computer wont have as much memory to do it in one try.

So, I split it into files of MAX_CUSTOMERS lines, and order them individually, all correct until here, but when I want to get the first two files and order them, I've got all the problems, I got repeated, ones and others dissapear, here's my code:

void MergeSort(string file1Name, string file2Name,string name){
printf("Enter MERGE SORT %s AND %s\n",file1Name.c_str(),file2Name.c_str());
string temp;
string fileName;
string lineFile1, lineFile2;
bool endFil1 = false, endFil2 = false;
int numCust1 = 0;
int numCust2 = 0;
int x1 = 0, x2 = 0;

ifstream file1;
file1.open(file1Name.c_str());
ifstream file2;
file2.open(file2Name.c_str());

ofstream mergeFile; 
fileName = "customers_" +name +".txt";
cout << "Result file " << fileName << endl;
mergeFile.open("temp.txt");

getline(file1,lineFile1);
getline(file2,lineFile2);

while(!endFil1 && !endFil2){
    if(CompareTelf(lineFile1,lineFile2)==1){
        mergeFile << lineFile1 << endl;
        if(!getline(file1,lineFile1)){
            cout << lineFile1 << endl;
            cout << "1st file end" << endl;         
            endFil1 = true;
        }
    }else{
        mergeFile << lineFile2 << endl;
        if(!getline(file2,lineFile2)){
            cout << lineFile2 << endl;
            cout << "2nd file end" << endl;         
            endFil2 = true;
        }
    }       
}
if(endFil1){
    //mergeFile << lineFile2 << endl;
    while(getline(file2,lineFile2)){
        mergeFile << lineFile2 << endl;
    }
}else{
    //mergeFile << lineFile1 << endl;
    while(getline(file1,lineFile1)){
        mergeFile << lineFile1 << endl;
    }
}

file1.close();
file2.close();
mergeFile.close();
rename("temp.txt",fileName.c_str());
return;
}

Customer SplitLine(string line){
string splitLine;
string temp;
Customer cust;
int actProp = 0;
int number;
istringstream readLineStream(line); //convert String readLine to Stream readLine

while(getline(readLineStream,splitLine,',')){
    if (actProp == 0)cust.name = splitLine;
    else if (actProp == 1)cust.city = splitLine;
    else if (actProp == 2)cust.mail = splitLine;
    else if (actProp == 3)cust.telf = atoi(splitLine.c_str());
    actProp++;
}
//printf("Customer read: %s, %s, %s, %i\n",cust.name.c_str(), cust.city.c_str(), cust.mail.c_str(), cust.telf);

return cust;
}

int CompareTelf(string str1, string str2){
    Customer c1 = SplitLine(str1);
    Customer c2 = SplitLine(str2);

    if(c1.telf<c2.telf)return 1; //return 1 if 1st string its more important than second, otherwise, return -1
    else return -1;
}

struct Customer{
        string name;
        string city;
        string mail;
        long telf;
};

If have some question about the code, just say it! I tried to use varNames as descriptive as possible!

Thanks a lot.

Your code seems quite good, but it has several flaws and one important omission.

One of the minor flaws is lack of initialization of Customer structure - you didn't provide a constructor to the struct, and do no explicit initialization of the cust variable. Hopefully string members are properly initialized by the string class constructor, but long telf may get any initial value.

Another one is lack of format checking in splitting an input line. Are you sure that every input line has same format? If there are lines with too many commas (say, comma inside a name) then the loop may incorrectly try to assign 'email' data to 'telf' member...
OTOH if there is too few commas, the 'telf' member may remain uninitialized, with a random initial value...
Together with the first one this flaw may lead to incorrect order of output data.

Similar problems arise when you use atoi function: it returns int but your variable is long . I suppose you have chosen long type because of the expected range of values - if so, converting input data to int may truncate significant part of data! I'm not sure what atoi does in that case, it may either return the result of converting some initial part of the input string or just return zero. Both values are wrong and lead to incorrect sorting, so you better use atol instead.

Next issue is reading first line from both input files. You don't check if getline() succeeded. If an input file is empty, the corresponding lineFile_num string will be empty, but endFil_num will not reflect that - it will still be false . So you again go into comparing invalid data.

Finally the main problem. Assume the file1 contents is 'greater than' (that is: goes after) the whole file2. Then the first line stored in lineFile1 results in CompareTelf() returning -1 all the time. the main loop copies the whole file2 into the output, and...? And the final while() loop starts with getline(file1,lineFile1) thus discarding the first line of file1!
Similar result happens with files consisting of records (A,C) and (B), to be merged as (A,B,C): first A and B are read in, then A is saved and C is read in, then B is saved and end of file 2 detected. Then while(getline(...)) cancels C in memory and finds end of file 1, which terminates the loop. Record C gets lost.
Generally, when the main merging loop while(!endFil1 && !endFil2) exhausts one of files, the first unsaved line of the other file gets discarded. To avoid this you need to store the result of the first read:

endFil1 = ! getline(file1,lineFile1);
endFil2 = ! getline(file2,lineFile2);

then, after the main loop, start copying the input file's tail with the unsaved line:

while(!endFil1) {
    mergeFile << lineFile1 << endl;
    endFil1 = !getline(file1,lineFile1);
}
while(!endFil2) {
    mergeFile << lineFile2 << endl;
    endFil2 = !getline(file2,lineFile2);
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM