如何將文本文件拆分成文字？

Question

我正在進行一項任務，我應該讀取一個文件並計算行數，同時計算其中的單詞。 我在while循環中嘗試了getline和strtok的組合，但是沒有用。

file：example.txt（要讀取的文件）。

嗨，你好，這真是一個驚喜。
歡迎來到這個地方。
願你在這里過得愉快。
（3行，有些話）。

Readfile.cpp

#include <iostream>
#include <fstream>
#include<string>
using namespace std;
int main()
{
  ifstream in("example.txt");
  int count = 0;

  if(!in)
  {
    cout << "Cannot open input file.\n";
    return 1;
  }

  char str[255];
  string tok;
  char * t2;

  while(in)
  {
    in.getline(str, 255);
    in>>tok;
    char *dup = strdup(tok.c_str());
    do 
    {
        t2 = strtok(dup," ");
    }while(t2 != NULL);
    cout<<t2<<endl;
    free (dup);
    count++;
  }
  in.close();
  cout<<count;
  return 0;
}

Answer 1

剛剛做對了!! 剛刪除所有不必要的代碼。

int main()
{    
    ifstream in("example.txt");
    int LineCount = 0;
    char* str = new char[500];

    while(in)
    {
        LineCount++;
        in.getline(str, 255);
        char * tempPtr = strtok(str," ");
        while(tempPtr)
        {
            AddWord(tempPtr, LineCount);
            tempPtr = strtok(NULL," ,.");
        }
    }
    in.close();
    delete [] str;
    cout<<"Total No of lines:"<<LineCount<<endl;
    showData();

    return 0;
}

BTW原始問題陳述是創建一個索引程序，它將接受用戶文件並創建所有單詞的行索引。

Answer 2

我還沒有嘗試過編譯，但這里有一個替代方法，它幾乎和使用Boost一樣簡單，但沒有額外的依賴性。

#include <iostream>
#include <sstream>
#include <string>

int main() {
  std::string line;
  while (std::getline(std::cin, line)) {
    std::istringstream linestream(line);
    std::string word;
    while (linestream >> word) {
      std::cout << word << "\n";
    }
  }
  return 0;
 }

Answer 3

ifstream is {"my_file_path"}; 
vector<string> b {istream_iterator<string>{is},istream_iterator<string>{}};

別忘了包括這個：

<iterator>

Answer 4

嘗試移動你的cout<<t2<<end; 聲明進入你的while循環。

這應該使您的代碼基本上功能。

您可能希望看到其他方法的類似帖子。

Answer 5

有這樣的例子發布在互聯網上。 這是我在高中時回信的計數詞程序。 用它作為起點。 我想指出的其他事情是：

std :: stringstream：你std :: getline整行，然后使用std :: stringstream將其切成小塊並標記它。 您可以使用std :: getline獲取整行，並將其輸入到std :: string中，然后您可以將其傳遞給std :: stringstream。

再一次，這只是一個例子而且不會完全按照你的意願去做，你需要自己修改它，讓它做你想做的事！

#include <iostream>
#include <map>
#include <string>
#include <cmath>
#include <fstream>

// Global variables
        std::map<std::string, int> wordcount;
        unsigned int numcount;

void addEntry (std::string &entry) {
        wordcount[entry]++;
        numcount++;
        return;
}


void returnCount () {
        double percentage = numcount * 0.01;
        percentage = floor(percentage + 0.5f);

        std::map<std::string, int>::iterator Iter;

        for (Iter = wordcount.begin(); Iter != wordcount.end(); ++Iter) {
                if ((*Iter).second > percentage) {
                        std::cout << (*Iter).first << " used " << (*Iter).second << " times" << std::endl;
                }
        }

}

int main(int argc, char *argv[]) {
        if (argc != 2) {
                std::cerr << "Please call the program like follows: \n\t" << argv[0] 
                        << " <file name>" << std::endl;
                return 1;
        }

        std::string data;

        std::ifstream fileRead;
        fileRead.open(argv[1]);
        while (fileRead >> data) {
                addEntry(data);
        }
        std::cout << "Total words in this file: " << numcount << std::endl;
        std::cout << "Words that are 1% of the file: " << std::endl;
        returnCount();
}

Answer 6

如果你可以使用boost庫，我建議使用boost :: tokenizer ：

boost Tokenizer包提供了一種靈活且易於使用的方法，可將字符串或其他字符序列分解為一系列標記。 下面是一個將短語分解為單詞的簡單示例。
 // simple_example_1.cpp #include<iostream> #include<boost/tokenizer.hpp> #include<string> int main(){ using namespace std; using namespace boost; string s = "This is, a test"; tokenizer<> tok(s); for(tokenizer<>::iterator beg=tok.begin();beg!=tok.end();++beg){ cout << *beg << "\\n"; } } 

如何將文本文件拆分成文字？

問題描述

6 個解決方案

解決方案1
5 2009-03-16 13:29:22

解決方案2
3 2009-03-18 03:20:54

解決方案3
0 2013-12-25 00:24:02

解決方案4
0 2009-03-16 06:27:55

解決方案5
0 2009-03-16 06:30:36

解決方案6
0 2009-03-16 13:51:30

如何將文本文件拆分成文字？

問題描述

6 個解決方案

解決方案1 5 2009-03-16 13:29:22

解決方案2 3 2009-03-18 03:20:54

解決方案3 0 2013-12-25 00:24:02

解決方案4 0 2009-03-16 06:27:55

解決方案5 0 2009-03-16 06:30:36

解決方案6 0 2009-03-16 13:51:30

解決方案1
5 2009-03-16 13:29:22

解決方案2
3 2009-03-18 03:20:54

解決方案3
0 2013-12-25 00:24:02

解決方案4
0 2009-03-16 06:27:55

解決方案5
0 2009-03-16 06:30:36

解決方案6
0 2009-03-16 13:51:30