简体   繁体   中英

Reading all the words in a text file in C++

I have a large .txt file and I want to read all of the words inside it and print them on the screen. The first thing I did was to use std::getline() in this way:

  std::vector<std::string> words;
  std::string line;
  while(std::getline(std::cin,line)){
    words.push_back(line);
  }

and then I printed out all the words present in the vector words . The .txt file is passed from command line as ./a.out < myTxt.txt .

The problem is that each component of the vector is a whole line , and so I am not reading each word .

The problem, I guess, is the spaces between words: how can I tell the code to ignore them? More specifically, is there any function that I can use in order to read each word from a .txt file?

UPDATE:

I'm trying to avoid all the commas . , but also ? ! () ? ! () ? ! () . I usedfind_first_of() , but my program doesn't work. Also, I don't know how to set what are the characters I don't want to be read, ie . , ? , ! , and so on

std::vector<std::string> my_vec;
  std::string line;
  while(std::cin>>line){
    std::size_t pos = line.find_first_of("!");
    std::string line = line.substr(pos);
    my_vec.push_back(line);
  }

'>>' operator of type string exactly fills your requirements.

  std::vector<std::string> words;
  std::string line;
  while (std::cin >> line) {
    words.push_back(line);
  }

If you need remove some noisy characters, eg ',','.', you can replace them with space character first.

#include <iostream>
#include <sstream>
#include <vector>
#include <algorithm>

int main() {
  std::vector<std::string> words;
  std::string line;
  while (getline(std::cin, line)) {
    std::transform(line.begin(), line.end(), line.begin(),
       [](char c) { return std::isalnum(c) ? c : ' '; });
    std::stringstream linestream(line);
    std::string w;
    while (linestream >> w) {
      std::cout << w << "\n";
      words.push_back(w);
    }
  }
}

cppreference

The getline function, as it sounds, only returns a whole line. You can split each line on spaces after reading it, or you can read word by word using operator>> :

string word;
while (cin >> word){
    cout << word << "\n";
    words.push_back(word);        
}

Use operator>> instead of std::getline() . The operator will read individual whitespace-separated substrings for you.

#include <iostream>
#include <string>
#include <vector>

std::vector<std::string> my_vec;
std::string s;
while (std::cin >> s){
    // use s as needed...
}

However, you may still end up receiving strings that have punctuation in them without any surrounding whitespace, ie hello,world , so you will have to manually split those strings as needed, eg:

#include <iostream>
#include <string>
#include <vector>
#include <cctype>

std::vector<std::string> my_vec;
std::string s;
while (std::cin >> s){
    std::string::size_type start = 0, pos;
    while ((pos = s.find_first_of(".,?!()", start)) != std::string::npos){
        my_vec.push_back(s.substr(start, pos-start));
        start = s.find_first_not_of(".,?!() \t\f\r\n\v", pos+1);
    }
    if (start == 0)
        my_vec.push_back(s);
    else if (start != std::string::npos)
        my_vec.push_back(s.substr(start));
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM