简体   繁体   中英

Recursive listing files in C++ doesn't enter all subdirectories

!!!Solved!!!

Thank you guys for your help, it's all working now. I made changes to my code as suggested by @RSahu and got it to work.
Thanks for all your input I've been really stuck with this.
To @Basile: I will definitely check that out but for this particular piece of code I'm not gonna use it because it looks way too complicated :) But thanks for suggestion.



Original question

I'm trying to make a C++ code to list all files in given directory and it's subdirectories.

Quick explanation

Idea is that function list_dirs(_dir, _files, _current_dir) will start in top directory and put files into vector _files and when it find a directory it will call itself on this directory. The _current_dir is there to be prepended to file name if in subdirectory because I need to know the path structure (it's supposed to generate sitemap.xml).
In list_dirs there is a call to list_dir which simply returns all files in current directory, not making difference between file and directory.

My problem

What codes does now is that it lists all files in original directory and then all files in one subdirectory but skipping all other subdirectories. It will list them but not the files in them.
And to be even more cryptic, it list files only in this one specific directory and none other. I tried running it in multiple locations but it never went into any other directory.

Thanks in advance and please note that I am beginner at C++ so don't be harsh ;)
LIST_DIR

int list_dir(const std::string& dir, std::vector<std::string>& files){
    DIR *dp;
    struct dirent *dirp;
    unsigned fileCount = 0;

    if ((dp = opendir(dir.c_str())) == NULL){
        std::cout << "Error opening dir." << std::endl;
    }

    while ((dirp = readdir(dp)) != NULL){
        files.push_back(std::string (dirp->d_name));
        fileCount++;
    }

    closedir(dp);
    return fileCount;
}

and LIST_DIRS

int list_dirs (const std::string& _dir, std::vector<std::string>& _files, std::string _current_dir){
    std::vector<std::string> __files_or_dirs;

    list_dir(_dir, __files_or_dirs);

    std::vector<std::string>::iterator it = __files_or_dirs.begin();
    struct stat sb;

    while (it != __files_or_dirs.end()){
        if (lstat((&*it)->c_str(), &sb) == 0 && S_ISDIR(sb.st_mode)){
            /* how to do this better? */
            if (*it == "." || *it == ".."){
                __files_or_dirs.erase(it);
                continue;
            }

            /* here it should go into sub-directory */
            list_dirs(_dir + *it, _files, _current_dir + *it);

            __files_or_dirs.erase(it);
        } else {
            if (_current_dir.empty()){
                _files.push_back(*it);
            } else {
                _files.push_back(_current_dir + "/" + *it);
            }
            ++it;
        }
    }
}

The main problem is in the line:

if (lstat((&*it)->c_str(), &sb) == 0 && S_ISDIR(sb.st_mode)){

You are using the name of a directory entry in the call to lstat . When the function is dealing with a sub-directory, the entry name does not represent a valid path. You need to use something like:

std::string entry = *it;
std::string full_path = _dir + "/" + entry;
if (lstat(full_path.c_str(), &sb) == 0 && S_ISDIR(sb.st_mode)){

Suggestions for improvement

Update list_dir so that it doesn't include "." or ".." in the output. It makes sense to me to exclude those files to start with.

int list_dir(const std::string& dir, std::vector<std::string>& files){
   DIR *dp;
   struct dirent *dirp;
   unsigned fileCount = 0;

   if ((dp = opendir(dir.c_str())) == NULL){
      std::cout << "Error opening dir." << std::endl;
   }

   while ((dirp = readdir(dp)) != NULL){
      std::string entry = dirp->d_name;
      if ( entry == "." or entry == ".." )
      {
         continue;
      }

      files.push_back(entry);
      fileCount++;
   }

   closedir(dp);
   return fileCount;
}

In list_dirs , there is no need to erase items from _files_or_dirs . The code can be simplified with a for loop and by removing the calls to erase items from _files_or_dirs .

It's not clear to me what the purpose of _current_dir is. Perhaps it can be removed.

Here's an updated version of the function. _current_dir is used only to construct the value of the argument in the recursive call.

int list_dirs (const std::string& _dir,
               std::vector<std::string>& _files, std::string _current_dir){
   std::vector<std::string> __files_or_dirs;

   list_dir(_dir, __files_or_dirs);

   std::vector<std::string>::iterator it = __files_or_dirs.begin();
   struct stat sb;

   for (; it != __files_or_dirs.end() ; ++it){
      std::string entry = *it;
      std::string full_path = _dir + "/" + entry;

      if (lstat(full_path.c_str(), &sb) == 0 && S_ISDIR(sb.st_mode)){
         /* how to do this better? */

         /* here it should go into sub-directory */
         list_dirs(full_path, _files, _current_dir + "/" + entry);

      } else {
         _files.push_back(full_path);
      }
   }
}

I am not sure all of the problems in your code but I can tell you that this line and the other one similar to it are going to cause you problems:

__files_or_dirs.erase(it);

When you call erase you invalidate the iterator and references at or after the point of the erase, including the end() iterator (see this erase reference ). You are calling erase and then not storing the returned iterator and are then looking at it again after this call which is not a good thing to do. You should at least change the line to this so that you capture the returned iterator which should point to the element just after the erased element (or end() if it was the last element)

it = __files_or_dirs.erase(it);

It also appears from the code you posted that you have a redundancy between _dir and _current_dir . You do not modify either of them. You pass them in as the same value and they stay the same value throughout the function execution. Unless this is simplified code and you are doing something else, I would recommend you remove the _current_dir one and just stick with _dir . You can replace the line in the while loop with _dir where you are building the file name and you will have simplified your code which is always a good thing.

For this line:

   if (lstat((&*it)->c_str(), &sb) == 0 && S_ISDIR(sb.st_mode)){

Note that readdir and consequently list_dir only return the file name , not the full file path. So at this point (&*it)->c_str() only has a file name (eg "input.txt"), not the full path, so when you call lstat on a file in a subdirectory, the system can't find it.

To fix this, you will need to add in the file path before calling lstat. Something like:

   string fullFileName;
   if (dir.empty()){
       fullFileName = *it;
   } else {
       fullFileName = dir + "/" + *it;
   }

   if (lstat(fullFileName.c_str(), &sb) == 0 && S_ISDIR(sb.st_mode)){

You may have to use _currentDir instead of dir , depending on what they are actually for (I couldn't follow your explanation).

A simpler way on Linux is to use the nftw(3) function. It is scanning recursively the file tree, and you give it some handler function.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM