C ++：如何构建两个以空格分隔的字符串的交集字符串？

Question

I have two space separated strings... (the X doesn't mean the same symbol) 我有两个用空格分隔的字符串...（X表示相同的符号）

st1 = "abc def kok...."
st2 = "kok bbr def ffe ...."

i would like to construct an intersection string as follows: common = "kok def" 我想构造一个交集字符串，如下所示： common = "kok def"

what is the efficient way to do so in c++? 用c ++这样做的有效方法是什么？

Thanks 谢谢

Answer 1

Use std::set_intersection 使用std::set_intersection

Sample program: 示例程序：

I'm assuming you've tokenized your strings already ( this solution seems easy to implement). 我假设您已经标记了字符串（此解决方案似乎易于实现）。

// Data
std::vector<string> a,b;
a.push_back("abc");b.push_back("kok");
a.push_back("def");b.push_back("bbr");
a.push_back("kok");b.push_back("def");
a.push_back("foo");b.push_back("ffe");

// Allocate space for intersection
std::vector<string> v(a.size()+b.size());

// Sort as required by set_intersection
std::sort(a.begin(),a.end());
std::sort(b.begin(),b.end());
// Compute
std::vector<string>::iterator it = std::set_intersection(a.begin(),a.end(),b.begin(),b.end(),v.begin());

// Display
v.erase(it,v.end());
for(std::vector<string>::iterator it = v.begin();it < v.end(); ++it) std::cout<<*it<<std::endl;

Complexity should be O(n log n) in the number of tokens (or sub-strings). 令牌（或子字符串）数量的复杂度应为O（n log n ）。

Answer 2

Split st1 in substrings and put them all into a std::set 将st1拆分为子字符串，并将它们全部放入std::set
Split st2 in substrings and check for each of them if they exist in the set created in step 1. 将st2拆分为子字符串，并检查每个字符串是否存在于步骤1中创建的集合中。

This will give O(n log n) execution time. 这将给定O(n log n)执行时间。 You have to loop through both strings exactly once. 您必须在两个字符串之间循环一次。 Insertion and retrieval from the set is usually O(log n) for each element, which gives O(n log n) . 对于每个元素，从集合中插入和检索通常为O(log n) ，从而得出O(n log n) 。

If you can use a hash based set (or some other unordered set) with O(1) insert and retrieval complexity you will cut the complexity down to O(n) . 如果您可以将基于哈希的集合（或其他一些无序集合）与O(1)插入和检索复杂度一起使用，则可以将复杂度降低为O(n) 。

Answer 3

To expand a bit on the answers you've already gotten, there are basically two factors to consider that you haven't specified. 为了进一步扩展您已经获得的答案，基本上有两个因素需要考虑，您尚未指定。 First, if there are duplicate elements in the input, do you want those considered for the output. 首先，如果输入中包含重复的元素，则是否要考虑将这些元素用于输出。 For example, given input like: 例如，给定的输入如下：

st1 = "kok abc def kok...."
st2 = "kok bbr kok def ffe ...."

Since "kok" appears twice in both inputs, should "kok" appear once in the output or twice? 由于“ kok”在两个输入中都出现两次，因此“ kok”应在输出中出现一次还是两次？

The second is your usage pattern. 第二个是您的使用模式。 Do you have a pattern of reading all the input, then generating a single output, or is a more iterative, where you might read some input, generate an output, read some more input that's added to the previous, generate another output, and so on? 您是否具有读取所有输入，然后生成单个输出的模式，或者是更具迭代性的模式，您可能在其中读取一些输入，生成一个输出，读取添加到前一个输入的更多输入，生成另一个输出，等等上？

If you're going read all the input, then generate one output, you probably want to use std::vector followed by std::sort . 如果您要读取所有输入，然后生成一个输出，则可能要使用std::vector和std::sort 。 If you only want each input to appear only once in the output, regardless of how often it appears in both inputs, then you'd follow that by std::unique , and finally do your set_intersection . 如果您只希望每个输入在输出中仅出现一次，而不管它在两个输入中出现的频率如何，那么您可以在std::unique ，最后执行set_intersection 。

If you want to support iterative updates, then you probably want to use std::set or std::multiset ( std::set for each output to be unique, std::multiset if repeated inputs should give repeated results). 如果要支持迭代更新，则可能要使用std::set或std::multiset （每个输出的std::set都是唯一的，如果重复输入应该给出重复的结果，则为std::multiset ）。

Edit: based on the lack of duplication in the input, a really quick simple implementation would be something like: 编辑：基于输入中没有重复项，一个非常快速的简单实现将是这样的：

#include <string>
#include <set>
#include <algorithm>
#include <iterator>
#include <sstream>
#include <iostream>

int main() {   
    std::string st1("abc def kok");
    std::string st2("kok bbr def ffe");

    std::istringstream s1(st1);
    std::istringstream s2(st2);

    // Initialize stringstreams. Whine about most vexing parse.
    std::set<std::string> words1((std::istream_iterator<std::string>(s1)), 
                                 std::istream_iterator<std::string>());

    std::set<std::string> words2((std::istream_iterator<std::string>(s2)), 
                                 std::istream_iterator<std::string>());

    std::ostringstream common;

    // put the intersection into common:
    std::set_intersection(words1.begin(), words1.end(), 
                          words2.begin(), words2.end(),
                          std::ostream_iterator<std::string>(common, " "));

    std::cout << common.str();  // show the result.
    return 0;
}

C ++：如何构建两个以空格分隔的字符串的交集字符串？

问题描述

3 个解决方案

解决方案1
9 已采纳 2011-08-08 20:22:05

Sample program: 示例程序：

解决方案2
2 2011-08-08 20:22:30

解决方案3
1 2011-08-08 21:16:08

C ++：如何构建两个以空格分隔的字符串的交集字符串？

问题描述

3 个解决方案

解决方案1 9 已采纳 2011-08-08 20:22:05

Sample program: 示例程序：

解决方案2 2 2011-08-08 20:22:30

解决方案3 1 2011-08-08 21:16:08

解决方案1
9 已采纳 2011-08-08 20:22:05

解决方案2
2 2011-08-08 20:22:30

解决方案3
1 2011-08-08 21:16:08