简体   繁体   中英

Building C++ translator with given dictionary?

I am trying to build a simple translator that translates sentences based on given dictionary. Let's assume that we have two strings of words

string ENG[] = {"black","coffee", "want","yesterday"};
string SPA[] = {"negro", "café", "quiero", ayer"};

If user gives "I want a black coffee." the results should be "I? quiro a? negro cafe." It means for the words that has no translation in dictionary strings there should be question mark next to it.

#include <iostream>
using namespace std;

int main(int argc, char *argv[]) {

  string input string ENG[] = {"black", "coffee", "want", "yesterday"};
  string SPA[] = {"negro", "café", "quiero", "ayer"};

  cout << "Enter a word";
  cin >> input;

  for (int i = 0; i < 10; ++i) {
    if (ENG[i] == input) {
      cout << "You entered " << SPA[i] << endl;
    }
  }
  return 0;
}

What I have written converts just the words. How can I write this code and make it possible for sentences?

Here you go.

#include <iostream>
#include <string>
#include <vector>

using namespace std;

vector <string> split_sentence(const string& arg)
{

    vector <string> ret;

    auto it = arg.begin();
    while (it != arg.end()) {

        string tmp;

        while (it != arg.end() && *it == ' ') ++it;
        while (it != arg.end() && *it != ' ')
            tmp += *it++;

        if (tmp.size())
            ret.push_back(tmp);
    }

    return ret;
}

int main(int argc, char *argv[])
{
    string input = "I want a black     coffee .";

    string ENG[4] = {"black","coffee", "want","yesterday"};
    string SPA[4] = {"negro", "café", "quiero", "ayer"};

    cout << "Enter sentence\n";
    /*
        cin >> input;
    */

    for (auto& str: split_sentence(input)) {

        bool found = false;

        for (int j=0; j<4 && !found; ++j) {

            if (ENG[j] == str) {
                cout << SPA[j] << " ";
                found = true;
            }
        }

        if (!found)
            cout << str << "? ";
    }

    cout << endl;
}

Output:

Enter sentence
I? quiero a? negro café .?

Split the sentence by spaces and then find the appropriate word from dict. If you're dict is big enough you need to use some tree like data structure to improve speed or sort and Hashing.

Edit:

Trie will be faster for this. For each query you 
can get the appropriate word in O(m), m = length of
query(English word)

As suggested in the comments, for this two separated arrays are really cumbersome to use and hard to update. Imagine inserting a new value pair in the middle and messing with the offsets…

So the far better solution here would be using a std::map , especially considering this is supposed to be a simple 1:1 mapping.

As such you can define a std::map using a std::string as a key (the original word) an a std::string as it's value (the translation).

When using modern C++, the initialization could look like this:

std::map<std::string, std::string> translations {
    {"black", "negro"},
    {"coffee", "café"},
    // ...
};

Now as for getting your input string word by word, the quickest built-in way would be using std::istringstream :

std::istringstream stream(myInputText);
std::string word;

while (stream >> word) {
    // do something with each word
}

Looking up the actual translations becomes trivial as well. Iterating over all translation happens in background (inside the std::map class):

const auto &res = translations.find(word);

if (res == translations.end()) // nothing found
    std::cout << "? ";
else
    std::cout << res->second << " "; // `res->second` is the value, `res->first` would be the key, i.e. `word`

As for a full tiny example:

#include <iostream>
#include <string>
#include <sstream>
#include <map>

int main(int argc, char **argv) {
    std::map<std::string, std::string> translations {
        {"black", "negro"},
        {"coffee", "café"}
    };

    std::string source("I'd like some black coffee");
    std::istringstream stream(source);
    std::string word;

    while (stream >> word) {
        const auto &t = translations.find(word);

        if (t != translations.end()) // found
            std::cout << word << ": " << t->second << "\n";
        else
            std::cout << word << ": ???\n";
        }

        return 0;
    }

This specific example would create the following output:

I'd: ???
like: ???
some: ???
black: negro
coffee: café

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM