简体   繁体   中英

How can I find the number of unique characters in a string?

I have found nothing particular for this purpose.

I am trying to figure out a function that counts each of the characters' occurrences in a string, so that I can pull them out at the end from the length to find how many homogeneous characters are used in that string.

I've tried with nested loop, the first to apply and the second to scan the string and conditionally fulfill the character if it does not appear elsewhere in the string:

size_t CountUniqueCharacters(char *str)
{
    int i,j;
    char unique[CHAR_MAX];
    for(i=strlen(str); i>=0; i--)
    {
        for(j=strlen(str); j>=0; j--)
        {
            if(str[i] != unique[j])
                unique[j] = str[i];
        }
    }
    return strlen(unique);
}

This didn't work well.

This is useful if you are willing to limit someone to type lazy names such as "aaaaaaaaaaaaa" .

Here's a simple C++ solution. This method has O(n) complexity:

int countDistinct(string s) 
{ 

    unordered_map<char, int> m; 
  
    for (int i = 0; i < s.length(); i++) { 
        m[s[i]]++; 
    } 
  
    return m.size(); 
} 

This method has O(n^2) complexity, but it's very possible (though a bit more complex) to do this in O(n) .

int CountUniqueCharacters(char* str){
    int count = 0;

    for (int i = 0; i < strlen(str); i++){
         bool appears = false;
         for (int j = 0; j < i; j++){
              if (str[j] == str[i]){
                  appears = true;
                  break;
              }
         }

         if (!appears){
             count++;
         }
    }

    return count;
}

The method iterates over all the characters in the string - for each character, it checks if the character appeared in any of the previous characters. If it didn't, then the character is unique, and the count is incremented.

well you can use a HashSet or unordered_set for the purpose but it has a worst case time complexity of O(N). Hence, its best to use an array of 256 memory locations or arr[256] . This gives the desired output in O(256)~ O(1) time

I find the following way of counting distinct characters, very simple and in O(n) . Here the logic is, just traverse through the character array, and for each character make its count 1 , even if it repeats, just override the value with 1 only. After you are done with traversing, just sum all the character occurance.

int count_distinc_char(const char *a){
     int c_arr[MAX_CHAR] = {0};
     int i, count = 0;
     for( i = 0; a[i] != '\0'; i++){
         c_arr[a[i] - 'a'] = 1;
     }    
     for( i = 0; i < MAX_CHAR; i++){
         count += c_arr[i];
     }
     return count;
}

Create a linked list to store the characters found in the string and its occurences with the node structure as follow,

struct tagCharOccurence 
{
    char ch;
    unsigned int iCount;
};

Now read all the characters in a string one by one and as you read one character check if it is present in your linked list, if yes then increase its count and if character is not found in linked list then insert a new node with 'ch' set to read character and count initialized to one.

In this way you'll get the count of occurences of each character in single pass only. You can now use the linked list to print the characters as many times as its has been encountered.

I just came across this question while looking for some other stuff on Stack Overflow. But I still post a solution which might be helpful to some:

This is also used for implementation of huffman conding here . There you need to know the frequency of each character, so a bit more than what you need.

#include <climits>
const int UniqueSymbols = 1 << CHAR_BIT;
const char* SampleString = "this is an example for huffman encoding";

Left shift operator shifts lhs (ie 1) CHAR_BIT times to the left, hence multiplying with 2^8 (on most computers) which is 256, as there are 256 unique symbols in UTF-8

and in your main you have

int main() {
    // Build frequency table
    int frequencies[UniqueSymbols] = {0};
    const char* ptr = SampleString;
    while (*ptr != '\0') {
        ++frequencies[*ptr++];
    }
}

I found it quite minimal and helpful. The only downside is that the size of frequencies is 256 here, uniqueness is then just checking which value is 1.

Here is source code of the C Program to Count the Number of Unique Words. The C program is successfully compiled and run on a Linux system

int i = 0, e, j, d, k, space = 0;

char a[50], b[15][20], c[15][20];



printf("Read a string:\n");

fflush(stdin);

scanf("%[^\n]s", a);

for (i = 0;a[i] != '\0';i++)        //loop to count no of words

{

    if (a[i] =  = ' ')

        space++;

}

i = 0;

for (j = 0;j<(space + 1);i++, j++)    //loop to store each word into an 2D array

{

    k = 0;

    while (a[i] != '\0')

    {

        if (a[i] == ' ')

        {

            break;

        }

        else

        {

            b[j][k++] = a[i];

            i++;

        }

    }

    b[j][k] = '\0';

}

i = 0;

strcpy(c[i], b[i]);

for (e = 1;e <= j;e++)        //loop to check whether the string is already present in the 2D array or not

{

    for (d = 0;d <= i;d++)

    {

        if (strcmp(c[i], b[e]) == 0)

            break;

        else

        {

            i++;

            strcpy(c[i], b[e]);

            break;

        }

    }

}

printf("\nNumber of unique words in %s are:%d", a, i);

return 0;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM