简体   繁体   中英

C++: how to build an intersection string of two space separated strings?

I have two space separated strings... (the X doesn't mean the same symbol)

st1 = "abc def kok...."
st2 = "kok bbr def ffe ...."

i would like to construct an intersection string as follows: common = "kok def"

what is the efficient way to do so in c++?

Thanks

Use std::set_intersection

Sample program:

I'm assuming you've tokenized your strings already ( this solution seems easy to implement).

// Data
std::vector<string> a,b;
a.push_back("abc");b.push_back("kok");
a.push_back("def");b.push_back("bbr");
a.push_back("kok");b.push_back("def");
a.push_back("foo");b.push_back("ffe");

// Allocate space for intersection
std::vector<string> v(a.size()+b.size());

// Sort as required by set_intersection
std::sort(a.begin(),a.end());
std::sort(b.begin(),b.end());
// Compute
std::vector<string>::iterator it = std::set_intersection(a.begin(),a.end(),b.begin(),b.end(),v.begin());

// Display
v.erase(it,v.end());
for(std::vector<string>::iterator it = v.begin();it < v.end(); ++it) std::cout<<*it<<std::endl;

Complexity should be O(n log n) in the number of tokens (or sub-strings).

  1. Split st1 in substrings and put them all into a std::set
  2. Split st2 in substrings and check for each of them if they exist in the set created in step 1.

This will give O(n log n) execution time. You have to loop through both strings exactly once. Insertion and retrieval from the set is usually O(log n) for each element, which gives O(n log n) .

If you can use a hash based set (or some other unordered set) with O(1) insert and retrieval complexity you will cut the complexity down to O(n) .

To expand a bit on the answers you've already gotten, there are basically two factors to consider that you haven't specified. First, if there are duplicate elements in the input, do you want those considered for the output. For example, given input like:

st1 = "kok abc def kok...."
st2 = "kok bbr kok def ffe ...."

Since "kok" appears twice in both inputs, should "kok" appear once in the output or twice?

The second is your usage pattern. Do you have a pattern of reading all the input, then generating a single output, or is a more iterative, where you might read some input, generate an output, read some more input that's added to the previous, generate another output, and so on?

If you're going read all the input, then generate one output, you probably want to use std::vector followed by std::sort . If you only want each input to appear only once in the output, regardless of how often it appears in both inputs, then you'd follow that by std::unique , and finally do your set_intersection .

If you want to support iterative updates, then you probably want to use std::set or std::multiset ( std::set for each output to be unique, std::multiset if repeated inputs should give repeated results).

Edit: based on the lack of duplication in the input, a really quick simple implementation would be something like:

#include <string>
#include <set>
#include <algorithm>
#include <iterator>
#include <sstream>
#include <iostream>

int main() {   
    std::string st1("abc def kok");
    std::string st2("kok bbr def ffe");

    std::istringstream s1(st1);
    std::istringstream s2(st2);

    // Initialize stringstreams. Whine about most vexing parse.
    std::set<std::string> words1((std::istream_iterator<std::string>(s1)), 
                                 std::istream_iterator<std::string>());

    std::set<std::string> words2((std::istream_iterator<std::string>(s2)), 
                                 std::istream_iterator<std::string>());

    std::ostringstream common;

    // put the intersection into common:
    std::set_intersection(words1.begin(), words1.end(), 
                          words2.begin(), words2.end(),
                          std::ostream_iterator<std::string>(common, " "));

    std::cout << common.str();  // show the result.
    return 0;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM