在C中匹配（幾個）字符串的最有效方法？

Question

我們的系統需要接受來自終端的用戶輸入，並與一些已知的關鍵字字符串（可能為10）匹配。

我們沒有空間/計算機來進行正則表達式等，因此代碼必須小巧而快速。

現在，執行此操作的討厭方法是：

   // str is null-terminated, assume we know it's safe/sane here
   if(!strncmp(str,"hello",5)
   {
      do_hello();
   }
   else if(!strncmp(str,"world",5)
   {
      do_world();
   }
   else
   {
      meh(); // Wasn't a match
   }

因此，經過一番谷歌搜索和閱讀后，我確信更好的方法是將各種匹配項的哈希值作為int預先計算，然后使用case語句：

// Assume hash() stops at NULL
switch(hash(str))
{
   case HASH_OF_HELLO:
      do_hello();
      break;

   case HASH_OF_WORLD:
      do_world();
      break;

   default:
      meh();
      break;
}

我們可以在編譯時計算* HASH_OF_match *。 這似乎是從相對較小的集合中選擇字符串的一種更快/更優雅的方法。

所以-這看起來合理嗎？ /這樣做有明顯的問題嗎？ /有人有更優雅的方法嗎？

作為一個腳注，這是我今天下午見過的最漂亮的哈希算法；），歸功於dan bernstein，它看起來很適合手頭的工作。

unsigned int
get_hash(const char* s)
{
    unsigned int hash = 0;
    int c;

    while((c = *s++))
    {
        // hash = hash * 33 ^ c 
        hash = ((hash << 5) + hash) ^ c;
    }

    return hash;
}

Answer 1

散列的問題在於，用戶輸入的任意字符串可能會產生與您的匹配項相同的散列，並且您將執行錯誤的操作。 對於只有10個的搜索集，我只會堅持采用if-else方法。 或使用字符串數組和函數指針數組（假設所有函數都具有相同的簽名）來選擇要執行的函數。

char const *matches[10] = {"first", "second", ..., "tenth"};
void (*fn[10])(void) = {&do_first, &do_second, ..., &do_tenth};

for( i = 0; i < 10; ++i ) {
  if( strcmp( str, matches[i] ) == 0 ) {
    (*fn[i])();
  }
}

Answer 2

像在Boyer-Moore字符串搜索算法中一樣，僅對最后一個字符使用嵌套的switch語句該怎么辦？

http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm

Answer 3

聽起來像您想使用gperf 。

Answer 4

哈希表和哈希表最適合大量數據。 由於輸入字符串的數目是已知的並且是有限的，因此您可以考慮采用以下方法：

假設已知字符串為

const char* STR_TABLE [STR_N] =
{
  "hello",
  "world",
  "this",
  "is",
  "a",
  "number",
  "of",
  "ten",
  "test",
  "strings"
};

然后，我們可以在編譯之前按字母順序對其進行手動排序，因為排序后的表格可以提供更快的搜索可能性。 然后，您可以使用二進制搜索。

#include <stdio.h>
#include <stdlib.h>

#define STR_N 10


const char* STR_TABLE [STR_N] =
{
  "a",
  "hello",
  "is",
  "number",
  "of",
  "strings",
  "ten",
  "test",
  "this",
  "world"
};


int ptr_strcmp (const void* str1, const void* str2)
{
  return strcmp(str1, *(const char**)str2);
}

int main()
{
  const char* user_input = "world"; // worst case
  const char** result;

  result = bsearch (user_input,
                    STR_TABLE,
                    STR_N,
                    sizeof(const char*),
                    ptr_strcmp);

  if(result != NULL)
  {
    printf("%s\n", *result);
  }
  else
  {
    printf("meh\n");
  }

}

這將歸結為：

比較“ world”和“ of”，比較1個“ w”！=“ o”。

比較“世界”與“測試”，1比較“ w”！=“ t”。

比較“ world”和“ this”，比較1個“ w”！=“ t”。

比較“世界”與“世界”，進行5個比較。

比較總數為8。

當然，其中涉及一些開銷代碼，以檢查'\\ 0'和二進制搜索調用。 您必須在特定平台上衡量建議的各種方法，以找出最佳方法。

Answer 5

可能的解決方案可能是這樣的：

struct keyword {
    unsigned int hash;
    const char *str;
    void (*job)();
};

//A table with our keywords with their corresponding hashes. If you could not
//compute the hash at compile time, a simple init() function at the beginning
//of your program could initialize each entry by using the value in 'str'
//You could also implement a dynamic version of this table (linked list of keywords)
//for extending your keyword table during runtime
struct keyword mykeywords[] = {
    {.hash = HASH_OF_HELLO, .str = "hello", .job = do_hello},
    {.hash = HASH_OF_WORLD, .str = "world", .job = do_world},
    ...
    {.str = 0} //signal end of list of keywords

};

void run(const char *cmd)
{
    unsigned int cmdhash = get_hash(cmd);
    struct keyword *kw = mykeywords;
    while(kw->str) {
        //If hash matches then compare the string, since we should consider hashing collisions too!
        //The order of conditions below is important
        if (kw->hash == cmdhash && !strcmp(cmd, kw->str)) { 
             kw->job();
             break;   
        }
        kw++;
    }
}

在C中匹配（幾個）字符串的最有效方法？

問題描述

5 個解決方案

解決方案1
5 已采納 2012-09-04 17:07:10

解決方案2
3 2012-09-04 16:58:53

解決方案3
1 2012-09-04 18:39:26

解決方案4
1 2012-09-05 08:49:16

解決方案5
0 2012-09-04 17:04:32

在C中匹配（幾個）字符串的最有效方法？

問題描述

5 個解決方案

解決方案1 5 已采納 2012-09-04 17:07:10

解決方案2 3 2012-09-04 16:58:53

解決方案3 1 2012-09-04 18:39:26

解決方案4 1 2012-09-05 08:49:16

解決方案5 0 2012-09-04 17:04:32

解決方案1
5 已采納 2012-09-04 17:07:10

解決方案2
3 2012-09-04 16:58:53

解決方案3
1 2012-09-04 18:39:26

解決方案4
1 2012-09-05 08:49:16

解決方案5
0 2012-09-04 17:04:32