简体   繁体   中英

C - Print the most frequent strings

in these days I have been posting some code because I am doing an exercise, finally it seems that I have ended it, but I noticed it doesn't work. The exercise asks in input: - N an integer, representing the number of strings to read - K an integer - N strings The strings can be duplicates. In the output there is a print of the K strings most frequent, ordered according to their frequency (decreasing order).

Example test set:

Input:

6
2
mickey
mouse
mickey
hello
mouse
mickey

Output :

mickey // Has freq 3
mouse // Has freq 2

I hope I explained the exercise in a good way, as this is my attempt.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef struct _stringa {
    char* string;
    int freq;
} stringa;


int compare(const void *elem1, const void *elem2) {
    stringa *first = (stringa *)elem1;
    stringa *second = (stringa *)elem2;

    if (first->freq < second->freq) {
        return -1;
    } else if (first->freq > second->freq) {
        return 1;
    } else {
    return 0;
    }
}

int BinarySearch(stringa** array, char* string, int left, int right) {
    int middle;
    if (left==right) {
        if (strcmp(string,array[left]->string)==0) {
            return left;
        } else {
            return -1;
        }
    }
    middle = (left+right)/2;
    if ((strcmp(string,array[middle]->string)<0) || (strcmp(string,array[middle]->string)==0) ) {
        return BinarySearch(array, string, left, middle);
    } else {
        return BinarySearch(array, string, middle+1, right);
    }

}


int main (void)
{
    char value[101];
    int n = 0;
    int stop;
    scanf("%d", &n); // Number of strings
    scanf("%d", &stop); // number of the most frequent strings to print

    stringa **array = NULL;
    array = malloc ( n * sizeof (struct _stringa *) );

    int i = 0;

    for (i=0; i<n; i++) {

        array[i] = malloc (sizeof (struct _stringa));
        array[i]->string = malloc (sizeof (value)); 

        scanf("%s", value);

        int already;
        already = BinarySearch(array, value, 0, i); // With a binary search, I see if the string is present in the previous positions of the array I am occupying. If it is not present, I copy the string into the array, otherwise, I use the value of binary search (which is the position of the element in the array) and I update the frequency field


        if (already==-1) {
            strcpy(array[i]->string,value); 
            array[i]->freq = 1;
        } else {
            array[already]->freq += 1;
        }

    }


    stringa **newarray = NULL; // New struct array of strings
    newarray = malloc ( n * sizeof (struct _stringa *) );

    int k = 0;
    for (i=0; i<n; i++) { // I use this loop to copy the element that don't have a frequency == 0
        if (array[i]->freq != 0) {
            newarray[k] = malloc(sizeof(struct _stringa));
            newarray[k] = malloc(sizeof(value));
            newarray[k]->string = array[i]->string;
            newarray[k]->freq = array[i]->freq;
            k++;
        }
    }
        qsort(newarray, n, sizeof(stringa*), compare);

        i=0;
        while ((newarray[i]!= NULL) && (i<k)) {
            printf("%s ", newarray[i]->string);
            printf("%d\n", newarray[i]->freq);
            i++;
        }


// Freeing operations        

    while (--n >= 0) {
        if (array[n]->string) free (array[n]->string);
        if (array[n]) free (array[n]);
    }

    if (array) free (array);
    if (newarray) free (newarray);

    return 0;
}

Thank you in advance to anyone who will have the time and patience to read this code.

EDIT:

I forgot to add what it's not working right. If I don't use the qsort for debugging reasons, and I use this input for example: 5 2 // random number, I still have to do the 'print the k strings' part, hello hello hello hello hello

It prints: hello 3 (freq) hello 2 (freq)

So it doesn't work properly. As you suggested in the comments, the binary search is flawed as it works only on an ordered list. What I could do is order the array each time, but I think this would be counter-productive. What could be the idea to get rid of the problem of locating only the strings that are not present in the array?

If you want an efficient method without sorting, use a hash table. Otherwise, simply put the each unique string in an array and scan it linearly, simple and reliable.

On modern hardware, this kind of scan is actually fast due to caches and minimising indirection. For small numbers of items an insertion sort is actually more efficient than qsort's in practice. Looking at the "Tim sort" algorithm for instance, which is stable and avoids qsort's poor performance with nearly sorted data, it mixes merge and insertion sorts to achieve n Log n, without extreme cases on real data.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM