C ++在文件中的一行中搜索某些单词，然后在这些单词之后插入一个单词

Question

我对C ++还是很陌生，而我一直在努力弄清楚如何解决此问题。 基本上，我需要从文件中读取并找到文章的所有实例（“ a”，“ A”，“ an”，“ aN”，“ An”，“ AN”，“ the”，“ The”，“然后，在该文章之后插入一个形容词。 形容词的大写字母必须基于文章前面的原词。 例如，如果我发现“ SHARK”，则需要将其设置为“ HAPPY SHARK”。 谁能告诉我最好的方法是什么？ 到目前为止，我已经放弃了很多想法，这就是我现在拥有的想法，尽管我认为我不能这样做：

#include <iostream>
#include <string>
#include <cctype>
#include <fstream>
#include <sstream>

using namespace std;

void
usage(char *progname, string msg){
    cerr << "Error: " << msg << endl;
    cerr << "Usage is: " << progname << " [filename]" << endl;
    cerr << " specifying filename reads from that file; no filename reads standard input" << endl;
}

int main(int argc, char *argv[])
{
    string adj;
    string file;
    string line;
    string articles[14] = {"a","A","an","aN","An","AN","the","The","tHe","thE","THe","tHE","ThE","THE"};
    ifstream rfile;
    cin >> adj;
    cin >> file;
    rfile.open(file.c_str());
    if(rfile.fail()){
        cerr << "Error while attempting to open the file." << endl;
        return 0;
    }
    while(rfile.good()){
        getline(rfile,line,'\n');
        istringstream iss(line);
        string word;
        while(iss >> word){
            for(int i = 0; i <= 14; i++){
                if(word == articles[i]){
                    cout << word + " " << endl;
                }else{
                    continue;
                }
            }
        }
        }
  }

Answer 1

到目前为止，还算不错，尽管如果您需要在一行的末尾处理一篇文章，那么可能会麻烦地逐行执行此操作。

无论如何，在与文章匹配之后，先忽略一下皱纹，然后首先需要获得下一个要用大写字母的单词。 然后，您需要为形容词创建一个具有正确大写字母的新字符串版本：

string adj_buf;  // big enough or dynamically allocate it based on adj

while(iss >> word){
    for(int i = 0; i <= 14; i++){
        if(word == articles[i]){
            cout << word + " ";
            iss >> word;  // TODO: check return value and handle no more words on this line
            adj_buf = adj;
            for (j = 0; j < word.size() && j < adj.size(); ++j)
                if (isupper(word[j]))
                    adj_buf[j] = toupper(adj[j]);
                else
                    adj_buf[j] = tolower(adj[j]);

            cout << adj_buf + " " + word;
            break;
        }
    }
}

转回皱纹我们忽略了。 您可能不想逐行然后逐个令牌地进行此操作，因为处理这种特殊情况在您的控件中会很丑陋。 相反，您可能希望在单个循环中逐个标记地进行操作。

因此，您需要编写一个在文件上运行的辅助函数或类，并可以给您下一个标记。 （我不确定STL中已经有这样的类了。）无论如何，使用您的I / O看起来可能像这样：

struct FileTokenizer
{
    FileTokenizer(string fileName) : rfile(fileName) {}

    bool getNextToken(string &token)
    {
        while (!(iss >> token))
        {
            string line;

            if (!rfile.getline(rfile, line, '\n'))
                return false;

            iss.reset(line);  // TODO: I don't know the actual call to reset it; look it up
        }

        return true;
    }

private:
    ifstream      rfile;
    istringstream iss;
};

然后您的主循环如下所示：

FileTokenizer tokenizer(file);

while (tokenizer.getNextToken(word))
{
    for(int i = 0; i <= 14; i++){
        if(word == articles[i]){
            cout << word + " ";

            if (!tokenizer.getNextToken(word))
                break; 

            adj_buf = adj;
            for (j = 0; j < word.size() && j < adj.size(); ++j)
                if (isupper(word[j]))
                    adj_buf[j] = toupper(adj[j]);
                else
                    adj_buf[j] = tolower(adj[j]);

            cout << adj_buf + " " + word;
            break;
        }
    }
}

您可能也想输出其余的输入吗？

Answer 2

首先，我建议您使用3个辅助函数来转换字符串大小写。 如果您经常处理文本，这些将很有用。 在这里，它们基于<algorithm>但是还有许多其他方法是可能的：

string strtoupper(const string& s) {   // return the uppercase of the string
    string str = s; 
    std::transform(str.begin(), str.end(), str.begin(), ::toupper);
    return str; 
}
string strtolower(const string& s) {    // return the lowercase of the string
    string str = s;
    std::transform(str.begin(), str.end(), str.begin(), ::tolower);
    return str;
}
string strcapitalize (const string& s) {  // return the capitalisation (1 upper, rest lower) of the string
    string str = s;
    std::transform(str.begin(), str.end(), str.begin(), ::tolower);
    if (str.size() > 0)
        str[0] = toupper(str[0]); 
    return str;
}

然后是一个实用程序函数，用于克隆单词的大写：将形容词设置为小写或大写，或将其大写（1个大写+其余小写），复制引用单词的大小写。 它足够健壮，可以处理空单词，而不包含单词的单词不是字母数字：

string clone_capitalisation(const string& a, const string& w) {
    if (w.size() == 0 || !isalpha(w[0]))  // empty or not a letter
        return a;                         //   => use adj as it is
    else {
        if (islower(w[0]))   // lowercase
            return strtolower(a);
        else return w.size() == 1 || isupper(w[1]) ? strtoupper(a) : strcapitalize(a);
    }
}

所有这些功能都不会更改原始字符串！

现在到main() ：我不喜欢必须手动将文章的大写和小写所有可能的组合放进去，所以我只处理大写。

我既不喜欢每个单词依次浏览所有可能的文章。 如果会有更多的文章，那将不是很好的表现！ 所以我更喜欢使用<set> ：

...
set<string> articles  { "A", "AN", "THE" };   // shorter isn't it ? 
...
while (getline(rfile, line)) {
    istringstream iss(line);
    string word;
    while (iss >> word) {     // loop 
        cout << word << " ";  // output the word in any case
        if (articles.find(strtoupper(word))!=articles.end()) {  // article found ?
            if (iss >> word) {  // then read the next word
                cout << clone_capitalisation(adj, word) << " " << word << " ";
            }
            else cout << word;  // if case there is no next word on the line...
        }
    }
    cout << endl; 
}

C ++在文件中的一行中搜索某些单词，然后在这些单词之后插入一个单词

问题描述

2 个解决方案

解决方案1
1 已采纳 2015-02-13 20:17:02

解决方案2
0 2015-02-13 21:06:27

C ++在文件中的一行中搜索某些单词，然后在这些单词之后插入一个单词

问题描述

2 个解决方案

解决方案1 1 已采纳 2015-02-13 20:17:02

解决方案2 0 2015-02-13 21:06:27

解决方案1
1 已采纳 2015-02-13 20:17:02

解决方案2
0 2015-02-13 21:06:27