简体   繁体   English

传递字符串参数,从文件中读取

[英]Passing string Argument, read from a file

I am Trying to find a regex pattern in a text. 我正在尝试在文本中找到一个regex模式。 Let's call the text: the original Text. 我们称其为文本:原始文本。 The following is the code for the patternFinder() program: 以下是patternFinder()程序的代码:

vector <pair <long,long> >CaddressParser::patternFinder(string pattern)

{


        string m_text1=m_text;
        int begin =0;
        int end=0;
        smatch m;
        regex e (pattern); 



    vector<pair<long, long>> indices;
    if(std::regex_search(m_text1,m,e))
    {
        begin=m.position();
        end=m.position()+m.length()-1;
        m_text1 = m.suffix().str();
        indices.push_back(make_pair(begin,end));
        while(end<m_length&&std::regex_search(m_text1,m,e))
            { 
                begin=end+m.prefix().length()+1;
                end=end+m.prefix().length()+m.length();
                indices.push_back(make_pair(begin,end));
                m_text1 = m.suffix().str();

            }
        return indices;
    }

    else return indices;
}

I have the following regular Expression : 我有以下regular Expression

"\\b[0-9]{3}\\b.*(Street).*[0-9]{5}"

and the Original text mentioned at the beginning is: 开头提到的原文是:

  • way 10.01.2013 700 West Market Street OH 35611 asdh 方式10.01.2013 700 West Market Street OH 35611 asdh

and only the bold text is supposed to match the regex. 并且只有粗体文本才应与正则表达式匹配。 Now the Problem is when the regex is passed as a string which has been read from a text file the patternFinder() does not recognize the pattern.Though when a direct string (which is identical to the one in the text file) is passed as an argument to patternFinder() it works. 现在的问题是当正则表达式作为从文本文件中读取的字符串传递时, patternFinder()无法识别该模式,尽管当直接字符串(与文本文件中的字符串相同)被传递为它是patternFinder()的参数。 Where could this problem coming from? 这个问题可能从哪里来?

The following is the code of my fileReader() function which I don't think is very relevant to mention: 以下是我不认为要提及的fileReader()函数的代码:

string CaddressParser::fileReader(string fileName)
{

    string text;
    FILE *fin;
    fin=fopen(fileName.c_str(),"rb" );
    int length=getLength(fileName);
    char *buffer= new char[length];
    fread(buffer,length,1,fin);
    buffer[length]='\0';
    text =string(buffer);
    fclose(fin);

    return text;

}  

Note that there is an apparent syntactic difference when writing the regex directly into C++ code and when reading it from a file. 请注意,将正则表达式直接写入C ++代码以及从文件读取正则表达式时,在语法上存在明显差异。

In C++, the backslash character has escape semantics, so to put a literal backslash into a string literal, you must escape it itself with a backslash. 在C ++中,反斜杠字符具有转义语义,因此要将文字反斜杠放入字符串文字中,您必须使用反斜杠本身对其进行转义。 So to get aa two-character string \\b in memory, you have to use a string literal "\\\\b" . 因此,要在内存中获得一个由两个字符组成的字符串\\b ,您必须使用字符串文字"\\\\b" The two backslashes are interpreted by the C++ compiler as a single backslash character to be stored in the literal. C ++编译器将两个反斜杠解释为要存储在文字中的单个反斜杠字符。 In other words, strlen("\\\\b") is 2. 换句话说, strlen("\\\\b")为2。

On the other hand, contents of a text file are read by your program and never processed by the C++ compiler. 另一方面,程序将读取文本文件的内容,而C ++编译器则不会对其进行处理。 So to get the two characters \\ and b into a string read from a file, write just the two-character string \\b into the file. 因此,要使两个字符\\b成为从文件读取的字符串,只需将两个字符的字符串\\b写入文件。

The problem is probably in the function reading the string from the file. 问题可能出在从文件读取字符串的函数中。 Print the string read and make sure the regular expression is being read correctly. 打印读取的字符串,并确保正则表达式被正确读取。

The problem is in these 2 lines 问题出在这两行
buffer[length]='\\0'; 缓冲液[长度] = '\\ 0';
text =string(buffer); 文字= string(buffer);

buffer[length] should have been buffer[length - 1] buffer [length]应该是buffer [length-1]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM