[英]Is there any way to get “\n” from streams?
我正在嘗試使用一個文件,並將其轉換為某種數據結構(文本是段落的“數組”,段落是句子的“數組”,句子是單詞的“數組”,它們是char * )。
為了使自己更輕松,我正在使用數據流(准確地說是ifstream),但是我遇到的問題之一是定義段落的結尾位置(將2'\\ n'視為段落的結尾)。 一種簡單的方法是逐字符處理文本,並檢查每個文本是否為空格或'\\ n',但這很長且很痛苦。
代碼看起來像這樣:
std::ifstream fd(filename);
char buffer[128];
while(fd >> buffer)
{
/* Some code in here that does things with buffer */
}
而且-可以,但是完全忽略了所有段落。 fd.get(buffer, 128, '\\n')
也不按需工作-讀取1次后它會切斷所有內容。
所以-有什么方法比逐個讀取char更容易做到這一點? 由於該任務禁止我們使用向量或字符串,因此無法使用getline()
。
UPDATE
因此,似乎std :: istream :: getline可以幫我解決這個問題,但這仍然不是我期望的。 它讀的是第一行,然后發生了一些奇怪的事情。
代碼如下所示:
std::ifstream fd(fl);
char buffer[128];
fd.getline(buffer, 128);
std::cout << "555 - [" << buffer << "]" << std::endl;
std::cout << fd.gcount() << std::endl;
fd.getline(buffer, 128);
std::cout << "777 - [" << buffer << "]" << std::endl;
std::cout << fd.gcount() << std::endl;
輸出看起來像這樣
]55 - [text from file
23
]77 - [
2
而且-是的,我認為我不了解發生了什么。
據我了解,您可能不會使用任何std容器。
所以我認為是可能的:
對於第一部分,您可以使用:
//! Reads a file to a buffer, that must be deleted afterwards
char* readFile(const char *filename) {
std::ifstream ifs(filename, std::ifstream::binary);
if (!filename.good())
return NULL;
ifs.seekg(0, ifs.end);
size_t len = ifs.tellg();
ifs.seekg(0, ifs.beg);
char* buffer = new char[len];
if (!buffer) { // Check for failed alocation
ifs.close();
return NULL;
}
if (ifs.read(buffer, len) != len) { // Check if the entire file was read
delete[] buffer;
buffer = NULL;
}
ifs.close();
return buffer;
}
在准備好該函數之后,我們現在所需要的就是使用它並標記該字符串。 為此,我們必須定義我們的類型(基於鏈接列表,使用C編碼格式)
struct Word {
char *contents;
Word *next;
};
struct Sentence {
Word *first;
Sentence *next;
};
struct Paragraph {
Sentence *first;
Paragraph *next;
};
struct Text {
Paragraph *first;
};
使用定義的類型,我們現在可以開始閱讀文本了:
//! Splits a sentence in as many Word elements as possible
void readSentence(char *buffer, size_t len, Word **target) {
if (!buffer || *buffer == '\0' || len == 0) return;
*target = new Word;
(*target)->next = NULL;
char *end = strpbrk(buffer, " \t\r\n");
if (end != NULL) {
(*target)->contents = new char[end - buffer + 1];
strncpy((*target)->contents, buffer, end - buffer);
(*target)->contents[end - buffer] = '\0';
readSentence(end + 1, strlen(end + 1), &(*target)->next);
}
else {
(*target)->contents = _strdup(buffer);
}
}
//! Splits a paragraph from a text buffer in as many Sentence as possible
void readParagraph(char *buffer, size_t len, Sentence **target) {
if (!buffer || *buffer == '\0' || len == 0) return;
*target = new Sentence;
(*target)->next = NULL;
char *end = strpbrk(buffer, ".;:?!");
if (end != NULL) {
char *t = new char[end - buffer + 2];
strncpy(t, buffer, end - buffer + 1);
t[end - buffer + 1] = '\0';
readSentence(t, (size_t)(end - buffer + 1), &(*target)->first);
delete[] t;
readParagraph(end + 1, len - (end - buffer + 1), &(*target)->next);
}
else {
readSentence(buffer, len, &(*target)->first);
}
}
//! Splits as many Paragraph as possible from a text buffer
void readText(char *buffer, Paragraph **target) {
if (!buffer || *buffer == '\0') return;
*target = new Paragraph;
(*target)->next = NULL;
char *end = strstr(buffer, "\n\n"); // With this, we have a pointer to the end of a paragraph. Pass to our sentence parser.
if (end != NULL) {
char *t = new char[end - buffer + 1];
strncpy(t, buffer, end - buffer);
t[end - buffer] = '\0';
readParagraph(t, (size_t)(end - buffer), &(*target)->first);
delete[] t;
readText(end + 2, &(*target)->next);
}
else
readParagraph(buffer, strlen(buffer), &(*target)->first);
}
Text* createText(char *contents) {
Text *text = new Text;
readText(contents, &text->first);
return text;
}
例如,您可以這樣使用它:
int main(int argc, char **argv) {
char *buffer = readFile("mytext.txt");
Text *text = createText(buffer);
delete[] buffer;
for (Paragraph* p = text->first; p != NULL; p = p->next) {
for (Sentence* s = p->first; s != NULL; s = s->next) {
for (Word* w = s->first; w != NULL; w = w->next) {
std::cout << w->contents << " ";
}
}
std::cout << std::endl << std::endl;
}
return 0;
}
請記住,由於我沒有測試此代碼,因此該代碼可能有效也可能無效。
資料來源:
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.