[英]compare two files and send out equal values
I'm new here. 我是新来的。 Trying to do something I think should be easy but can't get to work. 尝试做一些我认为应该很容易但无法开始工作的事情。 I have two files which have just simple data in 我有两个文件只有简单的数据
FileA FILEA
KIC
757137
892010
892107
892738
892760
893214
1026084
1435467
1026180
1026309
1026326
1026473
1027337
1160789
1161447
1161618
1162036
3112152
1163359
1163453
1163621
3123191
1164590
and File B 和文件B.
KICID
1430163
1435467
1725815
2309595
2450729
2837475
2849125
2852862
2865774
2991448
2998253
3112152
3112889
3115178
3123191
�
I'd like to read both files, and then print out the values that are the same, and ignoring titles. 我想要读取这两个文件,然后打印出相同的值,并忽略标题。 In this case I'd get that 1435467
3123191
are in both, and just these would be sent to a new file. 在这种情况下,我得到1435467
3123191
都在两者中,只是这些将被发送到一个新文件。 so far I have 到目前为止我有
#include <cmath>
#include <cstdlib>
#include <string>
#include <iomanip>
#include <iostream>
#include <fstream>
#include <ctime>
using namespace std;
// Globals, to allow being called from several functions
// main program
int main() {
float A, B;
ifstream inA("FileA"); // input stream
ifstream inB("FileB"); // second instream
ofstream outA("OutA.txt"); // output stream
while (inA >> A) {
while (inB >> B) {
if (A == B) {
outA << A << "\t" << B << endl;
}
}
}
return 0;
}
And this just produces an empty document OutA
I thought this would read a line of FileA
, then cycle through FileB
until it found a match, send to OutA
, and then move onto the next line of FileA
Any help would be appreciated? 这只是生成一个空文档OutA
我认为这将读取一行FileA
,然后循环通过FileB
直到找到匹配,发送到OutA
,然后移动到FileA
的下一行任何帮助将不胜感激?
You need to put 你需要把
inB.seekg(0, inB.beg)
to the end of the outer while loop. 到外部while循环的末尾。 Else you will stay at the end of inB
and will read nothing after processing of the first entry of inA
否则,你会留在年底inB
和第一项的处理后会读什么inA
Another problem may be that you are using float
for A and B. Try int
(or string
), as float may not behave as you expect with ==
. 另一个问题可能是你在A和B中使用float
。尝试int
(或string
),因为float可能不像你期望的那样表现为==
。 Refer to this question for details: What is the most effective way for float and double comparison? 有关详细信息,请参阅此问题: 浮动和双重比较的最有效方法是什么? . 。
This code worked on my platform: 此代码适用于我的平台:
...
while (inA >> A) {
inB.clear();
inB.seekg(0, inB.beg);
while (inB >> B) {
if (A == B) {
outA << A << "\t" << B << endl;
}
}
}
Notice the inB.clear()
and inB.seekg(...)
, A and B are strings. 注意inB.clear()
和inB.seekg(...)
,A和B是字符串。
By the way, this method only good for quick-and-dirty implementation, it's not optimal for big files, as you get N * M
complexity (N - size of FileA, M - size of FileB). 顺便说一句,这种方法只适用于快速和肮脏的实现,它不适合大文件,因为你得到N * M
复杂性(N - 大小的FileA,M - 大小的FileB)。 By using hash set you may get to nearly linear ( N + M
) complexity. 通过使用散列集,您可能会达到近似线性( N + M
)的复杂性。
Example of hash set implementation (C++11): 哈希集实现的例子(C ++ 11):
#include <string>
#include <iostream>
#include <fstream>
#include <unordered_set>
using namespace std;
int main() {
string A, B;
ifstream inA("FileA"); // input stream
ifstream inB("FileB"); // second instream
ofstream outA("OutA.txt"); // output stream
unordered_set<string> setA;
while (inA >> A) {
setA.insert(A);
}
while (inB >> B) {
if (setA.count(B)) {
outA << A << "\t" << B << endl;
}
}
return 0;
}
Are both the files small enough to read into memory? 两个文件都足够小以读入内存吗?
You could try something similar to the following: 您可以尝试类似以下内容:
int main(int argc, char**argv)
{
std::vector<std::string> a;
std::vector<std::string> b;
ofstream outA("OutA.txt"); // output stream
ifstream inA("FileA"); // input stream
ifstream inB("FileB"); // second instream
std::string value;
inA >> value; //read first line (and don't use - discarding header)
while (inA >> A) { a.push_back(A);} //populate first vector
inB >> value; //read first line (and don't use - discarding header)
while (inB >> B) { b.push_back(B);} //populate first vector
//std::sort will perform a pretty efficient sort
std::sort(a.begin(),a.end());
std::sort(b.begin(),b.end());
//now that it is sorted, comparing is easier
for (std::vector<std::string>::iterator ita=a.begin(), std::vector<std::string>::iterator itb=b.begin(); ita!=a.end(), itb!=b.end();)
{
if(*ita > *itb)
itb++;
else if(*ita < *itb)
ita++;
else
outA << *ita <<'\n';
}
return 0;
}
Reads both files into memory, sorts them both, and then compares them. 将这两个文件读入内存,对它们进行排序,然后对它们进行比较。 The comparison only has to go through each file once, which reduces the complexity immensely O(a+b)
instead of O(a*b)
. 比较只需要遍历每个文件一次,这极大地降低了复杂度O(a+b)
而不是O(a*b)
。 Of course the sorting will have an overhead, but this should be more efficient for larger files, and for shorter files it should be sufficiently fast still. 当然,排序会产生开销,但对于较大的文件,这应该更有效,对于较短的文件,它应该足够快。 (unless comparing lots and lots (and lots) of small files). (除非比较小文件的批次和批次)。 I believe with std::sort the worst case for all this is O(aloga + blogb)
which is better than O(a*b)
我相信std :: sort最糟糕的情况是O(aloga + blogb)
比O(a*b)
更好
In the end I fixed it like so 最后我修好了它
#include <cmath>
#include <cstdlib>
#include <string>
#include <iomanip>
#include <iostream>
#include <fstream>
#include <ctime>
using namespace std;
//Globals, to allow being called from several functions
//main program
int main() {
string A, B;
ifstream inA("FileA.txt"); //input stream
ifstream inB("FileB.txt") ;//second instream
ofstream outA("OutA.txt"); //output stream
while(inA>>A){//take in first stream
while(inB>>B){//whilst thats happening take in second stream
if (A==B){//do they match? If so then send out the value
outA<<A<<"\t"<<B<<endl; //THIS IS JUST SHOW A DOES = B!
}
}//end of B loop
inB.clear();//now clear the second stream (B)
inB.seekg(0, inB.beg);//return to start of stream B
}//move onto second input in stream A, and repeat
return 0;
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.