简体   繁体   中英

sort and remove duplicates from int array in c

I am learning C and came over the topic of sorting. I wrote a comp() function in and used qsort to sort an array of int . Now for the next task I need to remove the duplicates from the array.
Is it possible to sort and remove duplicates at the same time?

#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>    
int indexes[10] = { 0, 98, 45, 65, 45, 98, 78, 56, 65, 45 };

int comp(const void * elem1, const void * elem2) {

    int f = *((int*) elem1);
    int s = *((int*) elem2);

    if (f > s) {    
        return 1;
    }    
    if (f < s) {    
        return -1;
    }    
    return 0;
}

void printIndexArray() {    
    int i = 0;    
    for (i = 0; i < 10; i++) {    
        printf("i is %d\n", indexes[i]);    
    }
}

int main() {    
    qsort(indexes, sizeof(indexes) / sizeof(int), sizeof(int), comp);    
    printIndexArray();    
    return 0;
}

Since your numbers are already sorted, removing dupes is easy. In C++, it's even built in as std::unique :

http://en.cppreference.com/w/cpp/algorithm/unique

Assuming you want to do it yourself, you can do it the same way unique does it:

int* unique (int* first, int* last)
{
  if (first==last) return last;

  int* result = first;
  while (++first != last)
  {
    if (!(*result == *first)) 
      *(++result)=*first;
  }
  return ++result;
}

Yes

This can be achieved by mergesort . If both left and right are the same just merge the one value

That's the code that removes the duplicates using mergesort. This snippet of code does the removing work:

else if(a[p1] == a[p2])
{
    merged[p] = a[p1];
    p1++;
    p2++;
}

That's the iterative merge sort while the recursive version would be easier.

#include <stdio.h>
#include <stdlib.h>

#define min(a,b) (((a) < (b)) ? (a) : (b))

int indexes[10] = { 0, 98, 45, 65, 45, 98, 78, 56, 65, 45 };

void merge(int *a, int s, int m, int e)
{
    int p1 = s;
    int p2 = m + 1;
    int * merged = (int*)malloc(sizeof(int) * (e - s + 1));
    int p = 0;
    while(p1 < m + 1 && p2 < e + 1)
    {
        if(a[p1] > a[p2])
        {
            merged[p] = a[p2];
            p2++;
        }
        else if(a[p1] == a[p2])
        {
            merged[p] = a[p1];
            p1++;
            p2++;
        }
        else
        {
            merged[p] = a[p1];
            p1++;
        }
        p++;
    }

    while(p1 < m + 1)
    {
        merged[p++] = a[p1++];
    }

    while(p2 < e + 1)
        merged[p++] = a[p2++];

    int i;
    for(i = 0;i < (e -s+1); i++)
    {
        a[s + i] = merged[i];
    }

    free(merged);
}

void merge_sort(int *a, int n)
{
    int width;
    for(width = 1; width < n; width = 2 * width)
    {
        int i;
        for(i = 0; i < n; i = i + 2 * width)
        {
            merge(a, i, min(i + width - 1, n - 1), min(i + 2 * width - 1, n - 1) );
        }
    }
}

void printIndexArray()
{    
    int i = 0;    
    for(i = 0; i < 10; i++)
    {    
        printf("i is %d\n", indexes[i]);    
    }
}

int main()
{
    merge_sort(indexes, sizeof(indexes) / sizeof(int) );
    printIndexArray();
    return 0;
}

The short answer is: yes.

The long answer is: it is always possible, but the complexity to do it depends heavily on the algorithm you use.

The more complex algorithms like quick-sort, slow-sort, bucket-sort, and straight-radix-sort do not lend themselves to such an enhancement, because they rely on the data being in a consecutive array, that can implicitly be split into subarrays. So, when you detect a duplicate, you cannot easily take it out. Again, it is possible, but certainly not a problem for beginners.

The less complex in-place algorithms like bubble-sort, insertion-sort, and shell-sort make it relatively easy: you can just replace one of the duplicates you detect with a sentinel value that sorts greater than all legal values, and let it rise to the top. After that, you just need to scoop off the cream of sentinel values and you are done.

The algorithms that really lend themselves to removing duplicates, are the ones that use intermediate arrays that grow/shrink in the process; in these cases you can just shrink or skip growing one of these intermediate arrays when you detect a duplicate. Candidates are merge-sort and heap-sort.

Note, however, that it is more prudent to just sort the array, and eliminate duplicates in a second, separate step. Why? Because eliminating duplicates adds complexity to the inner loop of the sorting algorithm, which is of O(n*log(n)) in most relevant cases. But eliminating duplicates from a sorted array is an O(n) operation, making the split operation faster than the fused one.

#include <stdio.h>
#include <stdlib.h>

int indexes[10] = { 0, 98, 45, 65, 45, 98, 78, 56, 65, 45 };

size_t undup(int array[], size_t len)
{
size_t src,dst;

if (!len) return 0;
for (src=dst=1; src < len; src++) {
        if (array[dst-1] == array[src]) continue;
        array[dst++] = array[src];
        }
return dst;
}

int comp(const void * elem1, const void * elem2) {

    int f = *((int*) elem1);
    int s = *((int*) elem2);

    if (f > s)     return 1;
    if (f < s)     return -1;

    return 0;
}

void printIndexArray(size_t len) {
    size_t i = 0;
    for (i = 0; i < len; i++) {
        printf("array[%zu] is %d\n", i, indexes[i]);
    }
}

int main() {
    size_t len = 10;
    printf("Before sort\n" );
    printIndexArray(len);

    qsort(indexes, sizeof indexes / sizeof indexes[0], sizeof indexes[0], comp);
    printf("After sort\n" );
    printIndexArray(len);

    len = undup(indexes,10);
    printf("After undup\n" );
    printIndexArray(len);

    return 0;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM