简体   繁体   中英

regarding file i/o in c++

I have a part of a code that does the following: It reads in sentences from a file in a particular format, puts them in a vector. To probe whether the strings in the vector are stored correctly, I put debugging cout statements. I found that the last string member member of the vector is "". Why is this so? The file I am reading from ends with the last floating point value (that is stored in weight in each iteration). There is no whitespace or \\n after that. I am pasting that part of the code in the form of a separate program below.

#include <iostream>
#include <stdio.h>
#include <string>
#include <vector>

using namespace std;


int dist=0;

void stringtolower(char *s)

{

 int i=0;

 char c;

 while(s[i]!='\0')

 {

  c=s[i];

  c=tolower(c);

  s[i]=c;

  i++;

 }

}



void cleanup(char *s)

{
 int i=0;
 dist=0;
 while(*(s+i)=='\r' || *(s+i)=='\n' || *(s+i)=='\t')
 {
  dist++;
  i++;
 }

 while(*(s+i)!='\0'){

    /*if(*(s+i)=='"' || *(s+i)=='`' || *(s+i)=='\'' || *(s+i)=='.')

      *(s+i)=' ';*/

  if(*(s+i)==':' || *(s+i)=='\t' || *(s+i)=='\n' || *(s+i)=='\r' || *(s+i)=='"' || *(s+i)=='`' ){

   *(s+i)='\0';

   break;

  }

  i++;

 }

 return; 

}





int isinlist(vector<string> sents, char *s){

 for(int i=0;i<sents.size();i++){

  if(!sents[i].compare(s)){

   return 1;

  }

 }

 return 0;

}

int main()
{
 char *s=NULL;
 FILE *fp;
 fp=fopen("1.txt","r");
 size_t len=0;
 ssize_t read;
 vector<string> sents;
 float weight;
 while(!feof(fp))
 {
  read=getdelim(&s,&len,':',fp);

  cleanup(s);
  s=s+dist;

  fscanf(fp,"%f",&weight);


  if(isinlist(sents,s)){

   continue;

  }
  stringtolower(s);
  string str(s);

  //sentences.push(str); // Push sentence into FIFO queue for later processing
  sents.push_back(str);
 }
 for(int i=0;i<sents.size();i++)
 {
  cout<<sents[i]<<endl;
 }
}

Thanks a lot for your help.

Because you are not handling end of file (eof) correctly.

You can only tell that you've reached the eof when you've tried to read beyond the end of the file. Consider the case of a 0 length file. When that happens, this will be the case.

FILE *fp = fopen(..., "r");
assert(!feof(fp));  // guaranteed, even if the file is 0 length

That is, even though there is no more data, feof will not return true until it actually tries to read the next byte.

What you need to do is detect end of file during reading. For example:

FILE *fp = fopen(..., "r");
char buffer[SIZE];
while (fgets(buffer, sizeof(buffer), fp) != NULL)
{
    // got some data, do something with it.
}

// fgets returned NULL, now let's check if it was because
// we got to the eof or had an error
if (feof(fp))
    // got to the end
else
    // got an error 

If getdelim is written properly, it should return an indicator when it has reached end-of-file. There are two different ways it may be written:

  1. It only returns indicator provided it hasn't already read any data when it reaches EOF
  2. It always returns indicator when it reaches EOF.

If the former, you want to structure your code like:

while (getdelim(&s,&len,':',fp) != GET_DELIM_EOF_VALUE)

If the latter, you will need something like:

while ((getdelim(&s,&len,':',fp) != GET_DELIMI_EOF_VALUE) ||
       (len != 0))

Some general tips:

Avoid global variables. The dist value is calculated purely within cleanup - it should be local to that function, and then returned from it, so the main function can use it.

Consider advancing pointers instead of using array-style indexing, to reduce the number of variables you need:

void stringtolower(char *s)
{
    char c;

    while (*s != '\0')
    {
        c = *s;
        c = tolower(c);
        *s = c;

        s++;
    }
}

And declare variables as close as possible to where they are used, and initialise them as you declare them:

void stringtolower(char *s)
{
    while (*s != '\0')
    {
        char c = *s;
        c = tolower(c);
        *s = c;

        s++;
    }
}

And avoid making temporary copies where they add no extra clarity:

void stringtolower(char *s)
{
    while (*s != '\0')
    {
        *s = tolower(*s);
        s++;
    }
}

And consider using for to express the usual iteration pattern:

void stringtolower(char *s)
{
    for (; *s != '\0'; s++)
        *s = tolower(*s);
}

Here's a similar job done on cleanup :

int cleanup(char *s)
{
    char *p = s;
    for (; *p == '\r' || *p == '\n' || *p =='\t'; p++);

    int dist = p - s;

    for (; *p != '\0'; p++) 
    {
        if (*p == ':' || 
            *p == '\t' || 
            *p == '\n' || 
            *p == '\r' || 
            *p == '"' || 
            *p == '`' ) 
        {
            *p = '\0';
            break;
        }
    }

    return dist;
}

Pick a single way of laying out braces and stick to it.

Consider using std::find from <algorithm> instead of your isinlist .

On the other hand, for keeping a list like that so you can search for values previously handled, use std::set instead of std::list . It has a built-in find function that will work much faster than a linear search:

std::set<std::string> sent;

...

if (sent.find(x) != sent.end())
    continue;

sent.insert(x);

Prefer std::string to represent intermediate string values. You can use character pointers for convenient manipulation, but you might as well write code the safe way until you have proof that it is a significant cause of your program running slowly.

Use std::ifstream to read input from a file. It will close the file after use automatically, which you forget to do with fclose .

If you do all these things, your program will be a lot shorter and more readable, and so easier for you to find out when you've got something wrong.

You are testing EOF and this does not guarantee that there is any data left for reading. Don't do it.

Parsing integers from a line

See my response there for further information. You should also use std::getline and std::ifstream instead of C file I/O.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM