简体   繁体   English

比较两个文件并发出相等的值

[英]compare two files and send out equal values

I'm new here. 我是新来的。 Trying to do something I think should be easy but can't get to work. 尝试做一些我认为应该很容易但无法开始工作的事情。 I have two files which have just simple data in 我有两个文件只有简单的数据

FileA FILEA

KIC
757137  
892010  
892107  
892738  
892760  
893214  
1026084
1435467
1026180
1026309
1026326
1026473
1027337
1160789
1161447
1161618
1162036
3112152
1163359
1163453
1163621
3123191
1164590

and File B 和文件B.

KICID
1430163
1435467
1725815
2309595
2450729
2837475
2849125
2852862
2865774
2991448
2998253
3112152
3112889
3115178
3123191
�

I'd like to read both files, and then print out the values that are the same, and ignoring titles. 我想要读取这两个文件,然后打印出相同的值,并忽略标题。 In this case I'd get that 1435467 3123191 are in both, and just these would be sent to a new file. 在这种情况下,我得到1435467 3123191都在两者中,只是这些将被发送到一个新文件。 so far I have 到目前为止我有

#include <cmath>
#include <cstdlib>
#include <string>
#include <iomanip>
#include <iostream>
#include <fstream>
#include <ctime>

using namespace std;

// Globals, to allow being called from several functions

// main program

int main() {
    float A, B;

    ifstream inA("FileA"); // input stream
    ifstream inB("FileB"); // second instream
    ofstream outA("OutA.txt"); // output stream

    while (inA >> A) {
        while (inB >> B) {

            if (A == B) {
                outA << A << "\t" << B << endl;
            }
        }
    }
    return 0;
}

And this just produces an empty document OutA I thought this would read a line of FileA , then cycle through FileB until it found a match, send to OutA , and then move onto the next line of FileA Any help would be appreciated? 这只是生成一个空文档OutA我认为这将读取一行FileA ,然后循环通过FileB直到找到匹配,发送到OutA ,然后移动到FileA的下一行任何帮助将不胜感激?

You need to put 你需要把

inB.seekg(0, inB.beg)

to the end of the outer while loop. 到外部while循环的末尾。 Else you will stay at the end of inB and will read nothing after processing of the first entry of inA 否则,你会留在年底inB和第一项的处理后会读什么inA

Another problem may be that you are using float for A and B. Try int (or string ), as float may not behave as you expect with == . 另一个问题可能是你在A和B中使用float 。尝试int (或string ),因为float可能不像你期望的那样表现为== Refer to this question for details: What is the most effective way for float and double comparison? 有关详细信息,请参阅此问题: 浮动和双重比较的最有效方法是什么? .

This code worked on my platform: 此代码适用于我的平台:

...
while (inA >> A) {
  inB.clear();
  inB.seekg(0, inB.beg);
  while (inB >> B) {
    if (A == B) {
      outA << A << "\t" << B << endl;
    }
  }
}

Notice the inB.clear() and inB.seekg(...) , A and B are strings. 注意inB.clear()inB.seekg(...) ,A和B是字符串。

By the way, this method only good for quick-and-dirty implementation, it's not optimal for big files, as you get N * M complexity (N - size of FileA, M - size of FileB). 顺便说一句,这种方法只适用于快速和肮脏的实现,它不适合大文件,因为你得到N * M复杂性(N - 大小的FileA,M - 大小的FileB)。 By using hash set you may get to nearly linear ( N + M ) complexity. 通过使用散列集,您可能会达到近似线性( N + M )的复杂性。

Example of hash set implementation (C++11): 哈希集实现的例子(C ++ 11):

#include <string>
#include <iostream>
#include <fstream>
#include <unordered_set>

using namespace std;

int main() {
  string A, B;

  ifstream inA("FileA"); // input stream
  ifstream inB("FileB"); // second instream
  ofstream outA("OutA.txt"); // output stream

  unordered_set<string> setA;

  while (inA >> A) {
    setA.insert(A);
  }

  while (inB >> B) {
    if (setA.count(B)) {
      outA << A << "\t" << B << endl;
    }
  }

  return 0;
}

Are both the files small enough to read into memory? 两个文件都足够小以读入内存吗?

You could try something similar to the following: 您可以尝试类似以下内容:

int main(int argc, char**argv)
{
    std::vector<std::string> a;
    std::vector<std::string> b;

    ofstream outA("OutA.txt"); // output stream
    ifstream inA("FileA"); // input stream
    ifstream inB("FileB"); // second instream

    std::string value;

    inA >> value;                        //read first line (and don't use - discarding header)
    while (inA >> A) { a.push_back(A);}  //populate first vector
    inB >> value;                        //read first line (and don't use - discarding header)
    while (inB >> B) { b.push_back(B);}  //populate first vector

    //std::sort will perform a pretty efficient sort
    std::sort(a.begin(),a.end());
    std::sort(b.begin(),b.end());

    //now that it is sorted, comparing is easier
    for (std::vector<std::string>::iterator ita=a.begin(), std::vector<std::string>::iterator itb=b.begin(); ita!=a.end(), itb!=b.end();)
    {
        if(*ita > *itb)
            itb++;
        else if(*ita < *itb)
            ita++;
        else
            outA << *ita <<'\n';
    }
    return 0;
}

Reads both files into memory, sorts them both, and then compares them. 将这两个文件读入内存,对它们进行排序,然后对它们进行比较。 The comparison only has to go through each file once, which reduces the complexity immensely O(a+b) instead of O(a*b) . 比较只需要遍历每个文件一次,这极大地降低了复杂度O(a+b)而不是O(a*b) Of course the sorting will have an overhead, but this should be more efficient for larger files, and for shorter files it should be sufficiently fast still. 当然,排序会产生开销,但对于较大的文件,这应该更有效,对于较短的文件,它应该足够快。 (unless comparing lots and lots (and lots) of small files). (除非比较小文件的批次和批次)。 I believe with std::sort the worst case for all this is O(aloga + blogb) which is better than O(a*b) 我相信std :: sort最糟糕的情况是O(aloga + blogb)O(a*b)更好

In the end I fixed it like so 最后我修好了它

#include <cmath>
#include <cstdlib>
#include <string>
#include <iomanip>
#include <iostream>
#include <fstream>
#include <ctime>

using namespace std;

//Globals, to allow being called from several functions


//main program

int main() {
string A, B;

    ifstream inA("FileA.txt"); //input stream
    ifstream inB("FileB.txt") ;//second instream 
    ofstream outA("OutA.txt"); //output stream

while(inA>>A){//take in first stream
        while(inB>>B){//whilst thats happening take in second stream

                if (A==B){//do they match? If so then send out the value 
                    outA<<A<<"\t"<<B<<endl; //THIS IS JUST SHOW A DOES = B!
                }

                    }//end of B loop
            inB.clear();//now clear the second stream (B)
            inB.seekg(0, inB.beg);//return to start of stream B
    }//move onto second input in stream A, and repeat
return 0;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM