简体   繁体   English

比较文本文件的元素

[英]Comparing elements of text file

I am trying to compare blocks of four numbers with each other to make a new output file with only the ones that meet that: four digit numbers which have all digits the same.我正在尝试将四个数字的块相互比较,以制作一个新的输出文件,其中只有那些满足:四位数字,所有数字都相同。

This is my code for the input file:这是我的输入文件代码:

int main() { ofstream outfile ("text.txt"); outfile << "1111 1212 4444 \\n 2222 \\n \\n 8888 4567" <<endl;

I want to split this in blocks of four like "1111", "1212" and so on to be able to only write the ones that meet the requirement in the new output file.我想把它分成四个块,比如“1111”、“1212”等等,以便能够只在新的输出文件中写入满足要求的那些。 I decided to conver the whole file into an integer vector to be able to compare them.我决定将整个文件转换成一个整数向量,以便能够比较它们。

   char digit;
   ifstream file("text.txt");
   vector <int> digits;

   while(file>>digit)
   {
      digits.push_back(digit - '0');
   }

and I suppose that the method that compares them would look something like this:我想比较它们的方法看起来像这样:

bool IsValid(vector<int> digits){

   for (int i=0; i<digits.size() i++)
   {
      if(digits[0] == digits[1] == digits[2] == digits [3])
         return true; 

      else 
      {
         return false;
      }
   }
}

However this would just compare the first block, would you do it differently?然而,这只会比较第一个块,你会以不同的方式做吗? or should I keep doing the vector idea.或者我应该继续做矢量的想法。

Hm, all what I have seen is rather complicated.嗯,我看到的都是比较复杂的。

Obviously you want to check for a pattern in a string.显然您想检查字符串中的模式。 And patterns are usually matched with regular expressions.并且模式通常与正则表达式匹配。

This will give you an extremely short solution.这将为您提供一个非常简短的解决方案。 Use std::regex .使用std::regex Regular expressions are part of C++ standard library.正则表达式是 C++ 标准库的一部分。 And they are also easy to use.而且它们也很容易使用。 And for your case you the regex is (\\d)\\1{3} .对于您的情况,您的正则表达式为(\\d)\\1{3} So, a digit followed by 3 of the same digits.因此,一个数字后跟 3 个相同的数字。

Program then boils down to one statement:然后程序可以归结为一个语句:

#include <sstream>
#include <iostream>
#include <iterator>
#include <algorithm>
#include <string>
#include <regex>

std::istringstream testData{R"(1111 1212 444414 555
2222

8888 4567)"};

int main()
{
    std::copy_if(
        std::istream_iterator<std::string>(testData), 
        {},
        std::ostream_iterator<std::string>(std::cout,"\n"),
        [](const std::string& s){
            return std::regex_match(s,std::regex(R"((\d)\1{3})"));
        }
    );

    return 0;
}

Of course you may use any std::fstream instead of the std::istringstream当然,您可以使用任何std::fstream而不是std::istringstream

And of course this is only one of many many possible and maybe not the best solution .当然,这只是众多可能的解决方案之一,也许不是最好的解决方案。 . . . .

I decided to conver the whole file into an integer vector to be able to compare them.我决定将整个文件转换成一个整数向量,以便能够比较它们。

You can then extract int s from the stream directly ( file >> int_variable ) and check if they are multiples of 1111 or not.然后您可以直接从流中提取int s( file >> int_variable )并检查它们是否是 1111 的倍数。

Suggestions in code:代码中的建议:

#include <fstream>
#include <iomanip>
#include <iostream>
#include <vector>

bool IsValid(int number) {
    // Check that number is in the valid range and that it's a multiple of 1111.
    return number >= 0 && number <= 9999 && (number / 1111) * 1111 == number;
}

// A function to process the values in a stream
std::vector<int> process_stream(std::istream& is) {
    std::vector<int> digits;
    int number;

    while(is >> number) {
        if(IsValid(number)) // Only save valid numbers
            digits.push_back(number);
    }
    return digits;
}

int main() {
    std::vector<int> digits;

    // Check that opening the file succeeds before using it
    if(std::ifstream file = std::ifstream("text.txt")) {
        digits = process_stream(file);
    }

    // Print the collected int:s
    for(int x : digits) {
        std::cout << std::setw(4) << std::setfill('0') << x << '\n';
    }
}

Another approach is to simply handle each input as a string, and the loop over each character in the string validating that it is a digit and equal to the previous character.另一种方法是简单地将每个输入作为一个字符串处理,并在字符串中的每个字符上循环验证它是一个数字并且等于前一个字符。 If it fails either test, then what was read wasn't an integer with all digits equal.如果任一测试失败,则读取的不是所有数字都相等的整数。

For example you could do:例如你可以这样做:

#include <iostream>
#include <sstream>
#include <string>
#include <cctype>

int main (void) {

int main (void) {

    std::string s;
    std::stringstream ss { "1 11 1111 foo 2222\nbar 1212\n4444\n8888\n4567\n"
                            "3433333 a8\n9999999999999999999\n" };

    while (ss >> s) {                               /* read each string */
        bool equaldigits = true;                    /* flags equal digits */
        for (size_t i = 1; i < s.length(); i++)     /* loop 1 - length */
            /* validate previous & current digits & equal */
            if (!isdigit(s[i-1]) || !isdigit(s[i]) || s[i-1] != s[i]) {
                equaldigits = false;                /* if not set flag false */
                break;                              /* break loop */
            }
        /* handle empty-string or single char case */
        if (!s.length() || (s.length() == 1 && !isdigit(s[0])))
            equaldigits = false;
        if (equaldigits)                            /* if all digits & equal */
            std::cout << s << '\n';                 /* output string */
    }
}

The std::stringstream above simply provides simulated input for the program.上面的std::stringstream只是为程序提供模拟输入。

( note: you can loop with std::string::iterator if you like, or use a range-based for loop and prev char to store the last seen. Here, it's just as easy to iterate over indexes) 注意:如果愿意,您可以使用std::string::iterator for循环,或者使用基于范围的for循环和prev字符来存储最后看到的内容。在这里,迭代索引同样容易)

Using std::string find_first_not_of使用 std::string find_first_not_of

Using existing string functions provides another way.使用现有的字符串函数提供了另一种方法。 After comparing that the first character is a digit, you can use std::basic_string::find_first_not_of to scan the rest of the string for a character that isn't the same as the first -- if the result isn't std::string::npos , then your string isn't all the same digit.在比较第一个字符是数字之后,您可以使用std::basic_string::find_first_not_of扫描字符串的其余部分以查找与第一个不同的字符——如果结果不是std::string::npos ,那么你的字符串不都是同一个数字。

#include <iostream>
#include <sstream>
#include <string>
#include <cctype>

int main (void) {

    std::string s;
    std::stringstream ss { "1 11 1111 foo 2222\nbar 1212\n4444\n8888\n4567\n"
                            "3433333 a8\n9999999999999999999\n" };

    while (ss >> s) {                               /* read each string */
        if (!isdigit(s.at(0)))                      /* 1st char digit? */
            continue;
        /* if remainder of chars not equal 1st char - not equal digits */
        if (s.find_first_not_of(s.at(0)) != std::string::npos)
            continue;
        std::cout << s << '\n';
    }
}

Both approaches product the same output.两种方法产生相同的输出。

Example Use/Output示例使用/输出

$ ./bin/intdigitssame
1
11
1111
2222
4444
8888
9999999999999999999

There are many other ways to do this as shown by the other good answers.如其他好的答案所示,还有许多其他方法可以做到这一点。 It's worth understanding each approach.每种方法都值得理解。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM