如何比较 C++ 中的长字符串？

Question

I know how to compare two strings with "==" or "compare", but if the string is very long, should we use a hash function and then compare with hash code?我知道如何用“==”或“比较”比较两个字符串，但是如果字符串很长，我们是否应该使用 hash function 然后与 Z0800FC577294C34E0B258AD28394359 代码进行比较？

static int n = 100000;

bool TestCompare(const string& a, const string& b) {
    return a == b;
}

bool TestCompareHash(const string& a, const string& b) {
    std::hash<std::string> hash_fn;

    std::size_t str_hash_a = hash_fn(a);
    std::size_t str_hash_b = hash_fn(b);
    return str_hash_a == str_hash_b;
}


int main()
{
    string a(100, 'a');
    string b(100, 'c');
    std::chrono::time_point<std::chrono::system_clock> now = std::chrono::system_clock::now();
    for (int i = 0; i < n; i++) {
        TestCompare(a, b);
    }
    std::chrono::duration<float> difference = std::chrono::system_clock::now() - now;
    cout << "difference.count() 1: " << difference.count() << endl;
    
    now = std::chrono::system_clock::now();
    for (int i = 0; i < n; i++) {    
        TestCompareHash(a, b);
    }
    difference = std::chrono::system_clock::now() - now;
    cout << "difference.count() 2: " << difference.count() << endl;    
    
    return 0;
}

I tested such a code and found that the hash_test will slow down when the string becomes longer, why?我测试了这样一段代码，发现当字符串变长时hash_test会变慢，为什么？

when string length is 100当字符串长度为100时

difference.count() 1: 0.00263665 
difference.count() 2: 0.00713478   //hash

when string length is 10000当字符串长度为10000时

difference.count() 1: 0.00322366  
difference.count() 2: 1.99765    //hash

I made some improvements to the test from the comments like "make both strings exact matches except for the last character".我从评论中对测试进行了一些改进，例如“使两个字符串完全匹配，除了最后一个字符”。

It seems that doing hashing does not save the amount of calculations.似乎做散列并没有节省计算量。 It may be possible to do these operations in the database to avoid a single point of problem, but it may not make much sense in comparing strings?或许可以在数据库中做这些操作，避免单点问题，但是比较字符串可能没有多大意义？

Answer 1

In your case the main issue is that you need to compute those hashes first and that costs more than comparison of strings (which "compares chars until they don't match", O(n) complexity at worst).在您的情况下，主要问题是您需要首先计算这些哈希值，这比比较字符串的成本更高（“比较字符直到它们不匹配”，最坏的情况是 O(n) 复杂度）。 You didn't provide hash_fn() but it generally must "go over all chars" (O(n) complexity).您没有提供 hash_fn() 但它通常必须“遍历所有字符”（O(n) 复杂性）。

Hashes would help if you compute and store them once and then expect to compare the strings many times.如果您计算和存储一次哈希然后期望多次比较字符串，那么哈希会有所帮助。

Note the hashes can be used only to compare for equality (eg no > or <).请注意，散列只能用于比较是否相等（例如，没有 > 或 <）。

如何比较 C++ 中的长字符串？

问题描述

1 个解决方案

解决方案1
2 2021-03-13 10:55:06

如何比较 C++ 中的长字符串？

问题描述

1 个解决方案

解决方案1 2 2021-03-13 10:55:06

解决方案1
2 2021-03-13 10:55:06