有什么方法可以從流中獲取“ \\ n”嗎？

Question

我正在嘗試使用一個文件，並將其轉換為某種數據結構（文本是段落的“數組”，段落是句子的“數組”，句子是單詞的“數組”，它們是char * ）。

為了使自己更輕松，我正在使用數據流（准確地說是ifstream），但是我遇到的問題之一是定義段落的結尾位置（將2'\\ n'視為段落的結尾）。 一種簡單的方法是逐字符處理文本，並檢查每個文本是否為空格或'\\ n'，但這很長且很痛苦。

代碼看起來像這樣：

    std::ifstream fd(filename);
    char buffer[128];

    while(fd >> buffer)
    {
        /* Some code in here that does things with buffer */
    }

而且-可以，但是完全忽略了所有段落。 fd.get(buffer, 128, '\\n')也不按需工作-讀取1次后它會切斷所有內容。

所以-有什么方法比逐個讀取char更容易做到這一點？ 由於該任務禁止我們使用向量或字符串，因此無法使用getline() 。

UPDATE

因此，似乎std :: istream :: getline可以幫我解決這個問題，但這仍然不是我期望的。 它讀的是第一行，然后發生了一些奇怪的事情。

代碼如下所示：

std::ifstream fd(fl);
char buffer[128];
fd.getline(buffer, 128);
std::cout << "555 - [" << buffer << "]" << std::endl;
std::cout << fd.gcount() << std::endl;
fd.getline(buffer, 128);
std::cout << "777 - [" << buffer << "]" << std::endl;
std::cout << fd.gcount() << std::endl;

輸出看起來像這樣

]55 - [text from file
23
]77 - [
2

而且-是的，我認為我不了解發生了什么。

Answer 1

據我了解，您可能不會使用任何std容器。

所以我認為是可能的：

將整個文件讀取到緩沖區
標記段落的緩沖區
標記每個段落中的句子
將每個句子標記為單詞

對於第一部分，您可以使用：

//! Reads a file to a buffer, that must be deleted afterwards
char* readFile(const char *filename) {
  std::ifstream ifs(filename, std::ifstream::binary);

  if (!filename.good())
    return NULL;

  ifs.seekg(0, ifs.end);
  size_t len = ifs.tellg();
  ifs.seekg(0, ifs.beg);

  char* buffer = new char[len];
  if (!buffer) { // Check for failed alocation
    ifs.close();
    return NULL;
  }

  if (ifs.read(buffer, len) != len) { // Check if the entire file was read
    delete[] buffer;
    buffer = NULL;
  }
  ifs.close();
  return buffer;
}

在准備好該函數之后，我們現在所需要的就是使用它並標記該字符串。 為此，我們必須定義我們的類型（基於鏈接列表，使用C編碼格式）

struct Word {
  char *contents;
  Word *next;
};

struct Sentence {
  Word *first;
  Sentence *next;
};

struct Paragraph {
  Sentence *first;
  Paragraph *next;
};

struct Text {
  Paragraph *first;
};

使用定義的類型，我們現在可以開始閱讀文本了：

//! Splits a sentence in as many Word elements as possible
void readSentence(char *buffer, size_t len, Word **target) {
    if (!buffer || *buffer == '\0' || len == 0) return;

    *target = new Word;
    (*target)->next = NULL;

    char *end = strpbrk(buffer, " \t\r\n");

    if (end != NULL) {
        (*target)->contents = new char[end - buffer + 1];
        strncpy((*target)->contents, buffer, end - buffer);
        (*target)->contents[end - buffer] = '\0';
        readSentence(end + 1, strlen(end + 1), &(*target)->next);
    }
    else {
        (*target)->contents = _strdup(buffer);
    }
}

//! Splits a paragraph from a text buffer in as many Sentence as possible
void readParagraph(char *buffer, size_t len, Sentence **target) {
    if (!buffer || *buffer == '\0' || len == 0) return;

    *target = new Sentence;
    (*target)->next = NULL;

    char *end = strpbrk(buffer, ".;:?!");

    if (end != NULL) {
        char *t = new char[end - buffer + 2];
        strncpy(t, buffer, end - buffer + 1);
        t[end - buffer + 1] = '\0';
        readSentence(t, (size_t)(end - buffer + 1), &(*target)->first);
        delete[] t;

        readParagraph(end + 1, len - (end - buffer + 1), &(*target)->next);
    }
    else {
        readSentence(buffer, len, &(*target)->first);
    }
}

//! Splits as many Paragraph as possible from a text buffer
void readText(char *buffer, Paragraph **target) {
    if (!buffer || *buffer == '\0') return;

    *target = new Paragraph;
    (*target)->next = NULL;

    char *end = strstr(buffer, "\n\n"); // With this, we have a pointer to the end of a paragraph. Pass to our sentence parser.
    if (end != NULL) {
        char *t = new char[end - buffer + 1];
        strncpy(t, buffer, end - buffer);
        t[end - buffer] = '\0';
        readParagraph(t, (size_t)(end - buffer), &(*target)->first);
        delete[] t;

        readText(end + 2, &(*target)->next);
    }
    else
        readParagraph(buffer, strlen(buffer), &(*target)->first);
}

Text* createText(char *contents) {
    Text *text = new Text;
    readText(contents, &text->first);
    return text;
}

例如，您可以這樣使用它：

int main(int argc, char **argv) {
    char *buffer = readFile("mytext.txt");
    Text *text = createText(buffer);
    delete[] buffer;

    for (Paragraph* p = text->first; p != NULL; p = p->next) {
        for (Sentence* s = p->first; s != NULL; s = s->next) {
            for (Word* w = s->first; w != NULL; w = w->next) {
                std::cout << w->contents << " ";
            }
        }
        std::cout << std::endl << std::endl;
    }

    return 0;
}

請記住，由於我沒有測試此代碼，因此該代碼可能有效也可能無效。

資料來源：

http://www.cplusplus.com/reference/

有什么方法可以從流中獲取“ \\ n”嗎？

問題描述

1 個解決方案

解決方案1
1 已采納 2014-04-15 18:43:26

有什么方法可以從流中獲取“ \\ n”嗎？

問題描述

1 個解決方案

解決方案1 1 已采納 2014-04-15 18:43:26

解決方案1
1 已采納 2014-04-15 18:43:26