简体   繁体   English

C ++:如何构建两个以空格分隔的字符串的交集字符串?

[英]C++: how to build an intersection string of two space separated strings?

I have two space separated strings... (the X doesn't mean the same symbol) 我有两个用空格分隔的字符串...(X表示相同的符号)

st1 = "abc def kok...."
st2 = "kok bbr def ffe ...."

i would like to construct an intersection string as follows: common = "kok def" 我想构造一个交集字符串,如下所示: common = "kok def"

what is the efficient way to do so in c++? 用c ++这样做的有效方法是什么?

Thanks 谢谢

Use std::set_intersection 使用std::set_intersection

Sample program: 示例程序:

I'm assuming you've tokenized your strings already ( this solution seems easy to implement). 我假设您已经标记了字符串此解决方案似乎易于实现)。

// Data
std::vector<string> a,b;
a.push_back("abc");b.push_back("kok");
a.push_back("def");b.push_back("bbr");
a.push_back("kok");b.push_back("def");
a.push_back("foo");b.push_back("ffe");

// Allocate space for intersection
std::vector<string> v(a.size()+b.size());

// Sort as required by set_intersection
std::sort(a.begin(),a.end());
std::sort(b.begin(),b.end());
// Compute
std::vector<string>::iterator it = std::set_intersection(a.begin(),a.end(),b.begin(),b.end(),v.begin());

// Display
v.erase(it,v.end());
for(std::vector<string>::iterator it = v.begin();it < v.end(); ++it) std::cout<<*it<<std::endl;

Complexity should be O(n log n) in the number of tokens (or sub-strings). 令牌(或子字符串)数量的复杂度应为O(n log n )。

  1. Split st1 in substrings and put them all into a std::set st1拆分为子字符串,并将它们全部放入std::set
  2. Split st2 in substrings and check for each of them if they exist in the set created in step 1. st2拆分为子字符串,并检查每个字符串是否存在于步骤1中创建的集合中。

This will give O(n log n) execution time. 这将给定O(n log n)执行时间。 You have to loop through both strings exactly once. 您必须在两个字符串之间循环一次。 Insertion and retrieval from the set is usually O(log n) for each element, which gives O(n log n) . 对于每个元素,从集合中插入和检索通常为O(log n) ,从而得出O(n log n)

If you can use a hash based set (or some other unordered set) with O(1) insert and retrieval complexity you will cut the complexity down to O(n) . 如果您可以将基于哈希的集合(或其他一些无序集合)与O(1)插入和检索复杂度一起使用,则可以将复杂度降低为O(n)

To expand a bit on the answers you've already gotten, there are basically two factors to consider that you haven't specified. 为了进一步扩展您已经获得的答案,基本上有两个因素需要考虑,您尚未指定。 First, if there are duplicate elements in the input, do you want those considered for the output. 首先,如果输入中包含重复的元素,则是否要考虑将这些元素用于输出。 For example, given input like: 例如,给定的输入如下:

st1 = "kok abc def kok...."
st2 = "kok bbr kok def ffe ...."

Since "kok" appears twice in both inputs, should "kok" appear once in the output or twice? 由于“ kok”在两个输入中都出现两次,因此“ kok”应在输出中出现一次还是两次?

The second is your usage pattern. 第二个是您的使用模式。 Do you have a pattern of reading all the input, then generating a single output, or is a more iterative, where you might read some input, generate an output, read some more input that's added to the previous, generate another output, and so on? 您是否具有读取所有输入,然后生成单个输出的模式,或者是更具迭代性的模式,您可能在其中读取一些输入,生成一个输出,读取添加到前一个输入的更多输入,生成另一个输出,等等上?

If you're going read all the input, then generate one output, you probably want to use std::vector followed by std::sort . 如果您要读取所有输入,然后生成一个输出,则可能要使用std::vectorstd::sort If you only want each input to appear only once in the output, regardless of how often it appears in both inputs, then you'd follow that by std::unique , and finally do your set_intersection . 如果您只希望每个输入在输出中仅出现一次,而不管它在两个输入中出现的频率如何,那么您可以在std::unique ,最后执行set_intersection

If you want to support iterative updates, then you probably want to use std::set or std::multiset ( std::set for each output to be unique, std::multiset if repeated inputs should give repeated results). 如果要支持迭代更新,则可能要使用std::setstd::multiset (每个输出的std::set都是唯一的,如果重复输入应该给出重复的结果,则为std::multiset )。

Edit: based on the lack of duplication in the input, a really quick simple implementation would be something like: 编辑:基于输入中没有重复项,一个非常快速的简单实现将是这样的:

#include <string>
#include <set>
#include <algorithm>
#include <iterator>
#include <sstream>
#include <iostream>

int main() {   
    std::string st1("abc def kok");
    std::string st2("kok bbr def ffe");

    std::istringstream s1(st1);
    std::istringstream s2(st2);

    // Initialize stringstreams. Whine about most vexing parse.
    std::set<std::string> words1((std::istream_iterator<std::string>(s1)), 
                                 std::istream_iterator<std::string>());

    std::set<std::string> words2((std::istream_iterator<std::string>(s2)), 
                                 std::istream_iterator<std::string>());

    std::ostringstream common;

    // put the intersection into common:
    std::set_intersection(words1.begin(), words1.end(), 
                          words2.begin(), words2.end(),
                          std::ostream_iterator<std::string>(common, " "));

    std::cout << common.str();  // show the result.
    return 0;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在C ++中将空格分隔的字符串拆分为多个字符串? - How to split a space separated string into multiple strings in C++? 如果用户输入两个或多个由空格分隔的字符串,如何防止 C++ 缓冲区溢出? - How to prevent C++ buffer overflow if the user enters two or more strings separated by white spaces? C ++如何从字符串读取两行以分开的字符串? - C++ How to read two lines from string to separate strings? c ++如何根据最后一个&#39;。&#39;将字符串拆分为两个字符串。 - c++ How to split string into two strings based on the last '.' 如何从 C++ 中的两个字符串向量创建一个字符串? - How to create a string from two vectors of strings in C++? 两个矩形的C ++交集 - C++ intersection of two rectangles C++:两个范围的交集 - C++: Intersection Of two ranges 无法读入两个用空格隔开的字符串 - Can't Read in two strings separated by a space c++ 中的 stringstream 有助于从字符串中提取逗号分隔的整数,但不能使用向量从空格分隔的整数中提取,为什么? - stringstream in c++ helps to extract comma separated integers from string but not space separated integers using vectors,why? 如何接受空格分隔的整数并将它们存储在 C++ 中的向量中? - How to accept space separated integers and store them in a vector in C++?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM