简体   繁体   English

查找和打印不同字符串中的常见字母

[英]Finding and printing common letters in different strings

I need to find and print common characters in different strings.我需要在不同的字符串中查找和打印常见字符。 My code does not work as it should, it checks for same letters at the same index but thats not what I want.我的代码无法正常工作,它会检查相同索引处的相同字母,但这不是我想要的。 I couldn't find better solution for now.我暂时找不到更好的解决方案。 Thank you for help :)谢谢你的帮助 :)

#include <iostream>
#include <string>
using namespace std;

int main() {
    string niz1, niz2, niz3 = "";
    cout << "string 1: ";
    getline(cin, niz1);
    cout << "string 2: ";
    getline(cin, niz2);

    for (int i = 0; i < niz1.length() - 1; i++) {
        for (int j = 0; j < niz2.length() - 1; j++) {
            if (niz1[i] == niz2[j])
                niz3 += niz1[i];
        }
    }

    cout << "Same letters are: " << niz3 << endl;
    return 0;
}

Below is corrected working code.下面是更正的工作代码。 Basically the only correction that was needed to do is to make both loops have upper bound of niz.length() instead of niz.length() - 1 .基本上唯一需要做的修正是使两个循环都具有niz.length()而不是niz.length() - 1niz.length() - 1

Variant 1 :变体1

Try it online! 在线试试吧!

#include <string>
#include <iostream>
using namespace std;

int main() {
    string niz1, niz2, niz3 = "";
    cout << "string 1: ";
    getline(cin, niz1);
    cout << "string 2: ";
    getline(cin, niz2);

    for (int i = 0; i < niz1.length(); i++) {
        for (int j = 0; j < niz2.length(); j++) {
            if (niz1[i] == niz2[j])
                niz3 += niz1[i];
        }
    }

    cout << "Same letters are: " << niz3 << endl;

    return 0;
}

Input:输入:

string 1: adbc
string 2: cde

Output:输出:

Same letters are: dc

Also you may want to sort letters and make them unique, then you need to use std::set too like in code below:此外,您可能希望对字母进行排序并使它们独一无二,那么您也需要像下面的代码一样使用std::set

Variant 2 :变体2

Try it online! 在线试试吧!

#include <string>
#include <iostream>
#include <set>
using namespace std;

int main() {
    string niz1, niz2, niz3 = "";
    cout << "string 1: ";
    getline(cin, niz1);
    cout << "string 2: ";
    getline(cin, niz2);

    for (int i = 0; i < niz1.length(); i++) {
        for (int j = 0; j < niz2.length(); j++) {
            if (niz1[i] == niz2[j])
                niz3 += niz1[i];
        }
    }

    set<char> unique(niz3.begin(), niz3.end());
    niz3.assign(unique.begin(), unique.end());

    cout << "Same letters are: " << niz3 << endl;

    return 0;
}

Input:输入:

string 1: adbcda
string 2: cdecd

Output:输出:

Same letters are: cd

Also you may use just set s plus set_intersection standard function.你也可以只使用set s 加上set_intersection标准函数。 This will solve your task in less time , in O(N*log(N)) time instead of your O(N^2) time.这将在更短的时间内解决您的任务,在O(N*log(N))时间而不是您的O(N^2)时间。

Variant 3 :变体3

Try it online! 在线试试吧!

#include <string>
#include <iostream>
#include <set>
#include <algorithm>
#include <iterator>
using namespace std;

int main() {
    string niz1, niz2, niz3 = "";
    cout << "string 1: ";
    getline(cin, niz1);
    cout << "string 2: ";
    getline(cin, niz2);

    set<char> s1(niz1.begin(), niz1.end()), s2(niz2.begin(), niz2.end());
    set_intersection(s1.begin(), s1.end(), s2.begin(), s2.end(), back_inserter(niz3));

    cout << "Same letters are: " << niz3 << endl;

    return 0;
}

Input:输入:

string 1: adbcda
string 2: cdecd

Output:输出:

Same letters are: cd

Instead of set it is also possible to use unordered_set , it will give even more faster algorithm especially for long strings, algorithm will have running time O(N) compared to O(N * log(N)) for set solution.也可以使用unordered_set代替set ,它将提供更快的算法,尤其是对于长字符串,与set解决方案的O(N * log(N))相比,算法的运行时间为O(N) The only drawback is that unlike for set solution output of unordered_set solution is unsorted (but unique) (unordered sets don't sort their data).唯一的缺点是,与unordered_set解决方案的set解决方案输出不同,它是未排序的(但唯一的)(无序集合不对其数据进行排序)。

Variant 4 :变体 4

Try it online! 在线试试吧!

#include <string>
#include <iostream>
#include <unordered_set>
#include <algorithm>
#include <iterator>
using namespace std;

int main() {
    string niz1, niz2, niz3;
    cout << "string 1: ";
    getline(cin, niz1);
    cout << "string 2: ";
    getline(cin, niz2);

    unordered_set<char> s1(niz1.begin(), niz1.end()), s2;
    for (size_t i = 0; i < niz2.length(); ++i)
        if (s1.count(niz2[i]))
            s2.insert(niz2[i]);
    niz3.assign(s2.begin(), s2.end());

    cout << "Same letters are: " << niz3 << endl;

    return 0;
}

Input:输入:

string 1: adbcda
string 2: cdecd

Output:输出:

Same letters are: dc

Also one more way is to use just plain for loops like you did, without sets, but do extra block of loops in order to remove non-unique letters, like in code below.还有一种方法是像你一样使用普通的for循环,没有集合,但是做额外的循环块以删除非唯一的字母,就像下面的代码一样。 The only drawbacks of this loops method compared to sets method is that loops method runs slower and produces non-sorted output string.与sets 方法相比,此loops 方法的唯一缺点是loops 方法运行速度较慢并且产生未排序的输出字符串。

Variant 5 :变体 5

Try it online! 在线试试吧!

#include <string>
#include <iostream>
using namespace std;

int main() {
    string niz1, niz2, niz3, niz4;
    cout << "string 1: ";
    getline(cin, niz1);
    cout << "string 2: ";
    getline(cin, niz2);

    for (int i = 0; i < niz1.length(); ++i)
        for (int j = 0; j < niz2.length(); ++j)
            if (niz1[i] == niz2[j])
                niz3 += niz1[i];
    
    for (int i = 0; i < niz3.length(); ++i) {
        bool exists = false;
        for (int j = 0; j < niz4.length(); ++j)
            if (niz4[j] == niz3[i]) {
                exists = true;
                break;
            }
        if (!exists)
            niz4 += niz3[i];
    }

    cout << "Same letters are: " << niz4 << endl;

    return 0;
}

Input:输入:

string 1: adbcda
string 2: cdecd

Output:输出:

Same letters are: dc

This is sort of an answer in itself, and sort of an extended comment on @Arty's answer.这本身就是一个答案,也是对@Arty 答案的一种扩展评论。

Hash tables (which are what underlies an unordered_map) are really useful under many circumstances.哈希表(它是 unordered_map 的基础)在许多情况下非常有用。 But in this case, they're kind of overkill.但在这种情况下,他们有点矫枉过正。 In particular, a hash table is basically a way of creating a sparse array for cases where it's unrealistic or unreasonable to use the underlying "key" type directly as an index into an array.特别是,哈希表基本上是一种创建稀疏数组的方法,用于在不切实际或不合理的情况下直接使用底层“键”类型作为数组的索引。

In this case, however, what we're using as the key in the hash table is a character--a single byte.然而,在这种情况下,我们在哈希表中用作键的是一个字符——单个字节。 This is small enough, it's utterly trivial to just use an array, and use the byte directly as an index into the array.这足够小,仅使用数组并直接使用字节作为数组的索引完全是微不足道的。

So, with arrays instead of hash tables, we get code something on this order:因此,使用数组而不是哈希表,我们按以下顺序获得代码:

#include <array>
#include <string>
#include <iostream>
#include <chrono>

std::string getstring(std::string const &s) {
    std::cout << s << ": ";
    std::string input;
    std::getline(std::cin, input);
    return input;
}

using namespace std::chrono;

int main() {

    std::array<char, 256> a = {0};
    std::array<char, 256> b = {0};
    std::array<char, 256> result = { 0 };
    std::size_t pos=0;

    std::string s1 = getstring("s1");
    std::string s2 = getstring("s2");

    std::cout << "s1: " << s1 << "\n";
    std::cout << "s2: " << s2 << "\n";

    auto start = high_resolution_clock::now();

    for (auto c : s1)
        a[c] = 1;
    for (auto c : s2)
        b[c] = 1;

    for (int i = 'a'; i < 'z'; i++)
        if (a[i] != 0 && b[i] != 0)
            result[pos++] = i;
    for (int i = 'A'; i < 'Z'; i++)
        if (a[i] != 0 && b[i] != 0)
            result[pos++] = i;

    auto stop = high_resolution_clock::now();

    std::cout << "Common characters: " << std::string(result.data(), pos) <<"\n";
    std::cout << "Time: " << duration_cast<nanoseconds>(stop - start).count() << " nS\n ";
}

To get some repeatable test conditions, I built an input file with a couple of long fairly strings:为了获得一些可重复的测试条件,我用几个相当长的字符串构建了一个输入文件:

asdffghjllkjpoiuwqertqwerxvzxvcn
qqweroiglkgfpoilkagfskeqwriougfkljzxbvckxzv

After adding instrumentation (timing code) to his Variant 4 code, I found that his code ran in about 12,000 to 12,500 nanoseconds.在向他的 Variant 4 代码添加检测(计时代码)后,我发现他的代码运行时间约为 12,000 到 12,500 纳秒。

The code above, on the other hand, runs (on the same hardware) in about 850 nanoseconds, or around 15 times as fast.另一方面,上面的代码在大约 850 纳秒内运行(在相同的硬件上),或者大约快 15 倍。

I'm going to go on record as saying the opposite: although this is clearly faster , I'm pretty sure it's not the fastest possible.我将记录为相反的说法:虽然这显然更快,但我很确定它不是最快的。 I can see at least one fairly obvious improvement (store only one bit per character instead of one byte) that would probably improve speed by at least 2x, and could theoretically yield an improvement around 8x or so.我可以看到至少有一个相当明显的改进(每个字符只存储一位而不是一个字节),这可能将速度提高至少 2 倍,理论上可以提高大约 8 倍左右。 Unfortunately, we've already sped other things up enough that I doubt we'd see 8x--we'd probably see a bottleneck on reading in the data from memory first (but it's hard to be sure, and likely to vary between processors).不幸的是,我们已经对其他事情进行了足够的加速,我怀疑我们会看到 8 倍——我们可能会看到首先从内存中读取数据的瓶颈(但很难确定,并且可能因处理器而异)。 So, I'm going to leave that alone, at least for now.所以,我打算不理会它,至少现在是这样。 For now, I'll settle for only about fifteen times faster than "the fastest possible solution"... :-)现在,我只接受比“最快的解决方案”快 15 倍的速度……:-)

I suppose, in fairness, he probably really meant asymptotically the fastest.我想,公平地说,他的意思可能真的是渐近最快的。 His has (expected, but not guaranteed) O(N) complexity, and mine also has O(N) complexity (but in this case, basically guaranteed, not not just expected. In other words, the 15x is roughly a constant factor, not one we expect to change significantly with the size of the input string. Nonetheless, even if it's "only" a constant factor, 15x is still a pretty noticeable difference in speed.他的(预期,但不保证)O(N) 复杂度,我的也有 O(N) 复杂度(但在这种情况下,基本保证,不仅仅是预期。换句话说,15x 大致是一个常数因子,我们预计不会随着输入字符串的大小发生显着变化。尽管如此,即使它“只是”一个常数因子,15x 仍然是一个非常明显的速度差异。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM