简体   繁体   English

在C ++中搜索字符串对象数组的最有效方法?

[英]Most efficient way to search array of string objects in c++?

I've been searching for some more information on this topic, and can't seem to find the answer I'm looking for, so I hope you can help! 我一直在寻找有关此主题的更多信息,但似乎找不到我要找的答案,因此希望您能提供帮助!

Part of an assignment I'm working on is to write a program that searches an array of strings (address book), and returns matches if a full or partial match is found. 我正在做的作业的一部分是编写一个程序,该程序搜索字符串数组(地址簿),如果找到完全或部分匹配项,则返回匹配项。 I'm able to do it easily using an array of C-Strings, with the strstr() function running through a for loop and setting the pointer to the result of running the user input keyword into the array (see below). 我可以使用C字符串数组轻松完成此操作,其中strstr()函数通过for循环运行,并将指针设置为将用户输入关键字运行到数组的结果(请参见下文)。

My question is, how would I be able to do this, if at all, utilizing String objects? 我的问题是,如果有的话,我怎么能利用String对象做到这一点? I also need to take into consideration there being more than one possible match. 我还需要考虑可能存在不止一场比赛。 Is this the most efficient way of working this program out as well? 这也是解决该程序的最有效方法吗? I've already submitted my working version, I'm just curious as to some other methods to accomplish the same task! 我已经提交了我的工作版本,我只是好奇其他一些方法可以完成相同的任务!

#include <iostream>
#include <cstring>
using namespace std;

int main()
{

  bool isFound = false;         // Flag to indicate whether contact is found
  const int SIZE = 11;          // Size of contacts array
  const int MAX = 50;           // Maximum characters per row
  char contacts[SIZE][MAX] = { 
                                "Jig Sawyer, 555-1223",
                                "Michael Meyers, 555-0097",
                                "Jason Vorhees, 555-8787",
                                "Norman Bates, 555-1212",
                                "Count Dracula, 555-8878",
                                "Samara Moran, 555-0998",
                                "Hannibal Lector, 555-8712",
                                "Freddy Krueger, 555-7676",
                                "Leather Face, 555-9037",
                                "George H Bush, 555-4939",
                                "George W Bush, 555-2783"
                              };
  char *ptr = NULL;             // Pointer to search string within contacts
  char input[MAX];              // User search input string


  // Get the user input
  cout << "Please enter a contact to lookup in the address book: ";
  cin.getline(input,MAX);

  // Lookup contact(s)
  for (int i=0; i<SIZE; i++)
  {
    ptr = strstr(contacts[i], input);
    if (ptr != NULL)
      {
        cout << contacts[i] << endl;
        isFound = true;
      }
  }

  // Display error message if no matches found
  if (!contactFound)
    cout << "No contacts found." << endl;

  return 0;
} 

As you can tell, I like horror movies :) 如您所知,我喜欢恐怖电影:)

You really need to break each string into sortable components. 您确实需要将每个字符串分成可排序的组件。 If you don't know about structures yet, you can use more arrays. 如果您还不了解结构,则可以使用更多数组。 This would allow you to build "index" tables that would speed up the search. 这将使您能够构建“索引”表,从而加快搜索速度。

The most efficient method determines on the quantity of the data and the organization of the data. 最有效的方法取决于数据量和数据组织。

For small sets of data, the time difference between the different search methods is usually negligible -- some other item in your program will take longer (such as input or output). 对于小型数据集,不同搜索方法之间的时间差通常可以忽略不计-程序中的某些其他项将花费更长的时间(例如输入或输出)。

With string data, most of the time is spent comparing each character of one string to another. 对于字符串数据,大部分时间都花费在将一个字符串的每个字符与另一个字符串进行比较上。 The other operations, such as moving indices around, are negligible. 其他操作(例如,移动索引)可以忽略不计。

Since the comparison between search methods has already been performed, search the web for "Performance string searching comparing". 由于已经执行了搜索方法之间的比较,因此请在网上搜索“性能字符串搜索比较”。

An alternative would be to use regular expressions for string search. 另一种选择是使用正则表达式进行字符串搜索。 Now there's a lot of info out there and I'll just be providing a simple example where you try to match a subrange of the records (addresses) with a word2Search (which I've hardcoded to avoid cluttering the example). 现在有很多信息 ,我将仅提供一个简单的示例,在该示例中,您尝试将记录(地址)的子范围与word2Search (为避免使示例混乱,我对其进行了硬编码)。

I'm also using (a technique already mentioned in the comments) a preprocessing step where the array is sorted. 我还使用了(注释中已经提到的一种技术)对数组进行排序的预处理步骤。 Beware of two things : 当心两件事:

  • The sorting is done to enable a fast searching method, ie binary search (implemented with lower_bound upper_bound here) 完成排序以启用快速搜索方法,即二进制搜索(在此处使用lower_bound upper_bound实现)

  • If the word you are searching is not at the beginning of a record, there is no point in sorting the records since you won't be able to find a valid range (here it ite ) to search into (eg if you search for the numbers, the sorting of the string would be done in lexicographical comparison between strings, so it wouldn't be any good for locating 555 among strings beginning with M J and so on) 如果您要搜索的单词不在记录的开头,则对记录进行排序是没有意义的,因为您将无法找到要搜索的有效范围(此处it ite )(例如,如果搜索的是数字,字符串的排序将在字符串之间按字典顺序进行比较,因此在以M J开头的字符串之间定位555不会有什么好处,依此类推)

Explanation in the comments: 注释中的解释:

int main()
{
    // 1. Minor change - an array of strings is used
    string contacts[] = { 
        "Jig Sawyer, 555-1223",
        "Michael Meyers, 555-0097",
        "Jason Vorhees, 555-8787",
        "Norman Bates, 555-1212",
        "Count Dracula, 555-8878",
        "Samara Moran, 555-0998",
        "Hannibal Lector, 555-8712",
        "Freddy Krueger, 555-7676",
        "Leather Face, 555-9037",
        "George H Bush, 555-4939",
        "George W Bush, 555-2783"
    };
    // 2. The array is sorted to allow for binary search
    sort(begin(contacts), end(contacts));
    // 3. Example hard coded a word to search 
    string word2Search = "George";
    // 4. A regular expression is formed out of the target word
    regex expr(word2Search);
    // 5. Upper and lower bounds are set for the search
    char f = word2Search[0];
    std::string val1(1, f);
    std::string val2(1, ++f);
    // 6. Perform the search using regular expressions
    for (auto it(lower_bound(begin(contacts), end(contacts), val1)), 
        ite(lower_bound(begin(contacts), end(contacts), val2)); it != ite; ++it)
    {
        if (regex_search(it->begin(), it->end(), expr)) {
            cout << *it << endl;
        }
    }

    return 0;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM