简体   繁体   English

如何仅在C中列出第一级目录?

[英]How to list first level directories only in C?

In a terminal I can call ls -d */ . 在一个终端,我可以调用ls -d */ Now I want a program to do that for me, like this: 现在我想要一个程序为我这样做,像这样:

#include <sys/types.h>
#include <sys/wait.h>
#include <stdio.h>
#include <unistd.h>

int main( void )
{
    int status;

    char *args[] = { "/bin/ls", "-l", NULL };

    if ( fork() == 0 )
        execv( args[0], args );
    else
        wait( &status ); 

    return 0;
}

This will ls -l everything. 这将ls -l一切。 However, when I am trying: 但是,当我尝试时:

char *args[] = { "/bin/ls", "-d", "*/",  NULL };

I will get a runtime error: 我会得到一个运行时错误:

ls: */: No such file or directory ls:* /:没有这样的文件或目录

The lowest-level way to do this is with the same Linux system calls ls uses. 执行此操作的最低级别方法是使用ls使用的相同Linux系统调用。

So look at the output of strace -efile,getdents ls : 所以看看strace -efile,getdents ls的输出strace -efile,getdents ls

execve("/bin/ls", ["ls"], [/* 72 vars */]) = 0
...
openat(AT_FDCWD, ".", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
getdents(3, /* 23 entries */, 32768)    = 840
getdents(3, /* 0 entries */, 32768)     = 0
...

getdents is a Linux-specific system call. getdents是一个特定于Linux的系统调用。 The man page says that it's used under the hood by libc's readdir(3) POSIX API function . 该手册页说它是由libc的readdir(3) POSIX API函数在引擎盖下使用的。


The lowest-level portable way (portable to POSIX systems), is to use the libc functions to open a directory and read the entries. 最低级别的可移植方式(可移植到POSIX系统)是使用libc函数打开目录并读取条目。 POSIX doesn't specify the exact system call interface, unlike for non-directory files. 与非目录文件不同,POSIX不指定确切的系统调用接口。

These functions: 这些功能:

DIR *opendir(const char *name);
struct dirent *readdir(DIR *dirp);

can be used like this: 可以像这样使用:

// print all directories, and symlinks to directories, in the CWD.
// like sh -c 'ls -1UF -d */'  (single-column output, no sorting, append a / to dir names)
// tested and works on Linux, with / without working d_type

#define _GNU_SOURCE    // includes _BSD_SOURCE for DT_UNKNOWN etc.
#include <dirent.h>
#include <stdint.h>

#include <sys/types.h>
#include <sys/stat.h>
#include <stdio.h>
#include <stdlib.h>

int main() {
    DIR *dirhandle = opendir(".");     // POSIX doesn't require this to be a plain file descriptor.  Linux uses open(".", O_DIRECTORY); to implement this
    //^Todo: error check
    struct dirent *de;
    while(de = readdir(dirhandle)) { // NULL means end of directory
        _Bool is_dir;
    #ifdef _DIRENT_HAVE_D_TYPE
        if (de->d_type != DT_UNKNOWN && de->d_type != DT_LNK) {
           // don't have to stat if we have d_type info, unless it's a symlink (since we stat, not lstat)
           is_dir = (de->d_type == DT_DIR);
        } else
    #endif
        {  // the only method if d_type isn't available,
           // otherwise this is a fallback for FSes where the kernel leaves it DT_UNKNOWN.
           struct stat stbuf;
           // stat follows symlinks, lstat doesn't.
           stat(de->d_name, &stbuf);              // TODO: error check
           is_dir = S_ISDIR(stbuf.st_mode);
        }

        if (is_dir) {
           printf("%s/\n", de->d_name);
        }
    }
}

There's also a fully compilable example of reading directory entries and printing file info in the Linux stat(3posix) man page . 在Linux stat(3posix)手册页中还有一个完全可编辑的读取目录条目和打印文件信息的stat(3posix) (not the Linux stat(2) man page ; it has a different example). (不是Linux stat(2)手册页 ;它有一个不同的例子)。


The man page for readdir(3) says the Linux declaration of struct dirent is: readdir(3)的手册页说struct dirent的Linux声明是:

   struct dirent {
       ino_t          d_ino;       /* inode number */
       off_t          d_off;       /* not an offset; see NOTES */
       unsigned short d_reclen;    /* length of this record */
       unsigned char  d_type;      /* type of file; not supported
                                      by all filesystem types */
       char           d_name[256]; /* filename */
   };

d_type is either DT_UNKNOWN , in which case you need to stat to learn anything about whether the directory entry is itself a directory. d_type是DT_UNKNOWN ,在这种情况下,您需要stat来了解有关目录条目本身是否为目录的任何信息。 Or it can be DT_DIR or something else, in which case you can be sure it is or isn't a directory without having to stat it. 或者它可以是DT_DIR或其他东西,在这种情况下,您可以确定它是或不是一个目录而不必对其进行stat

Some filesystems, like EXT4 I think, and very recent XFS (with the new metadata version), keep type info in the directory, so it can be returned without having to load the inode from disk. 我认为有些文件系统,比如EXT4,以及非常新的XFS(带有新的元数据版本),会在目录中保留类型信息,因此无需从磁盘加载inode就可以返回它。 This is a huge speedup for find -name : it doesn't have to stat anything to recurse through subdirs. 对于find -name来说,这是一个巨大的加速:它不需要通过子目标来统计任何东西。 But for filesystems that don't do this, d_type will always be DT_UNKNOWN , because filling it in would require reading all the inodes (which might not even be loaded from disk). 但对于不这样做的文件系统, d_type将始终为DT_UNKNOWN ,因为填写它需要读取所有inode(甚至可能不从磁盘加载)。

Sometimes you're just matching on filenames, and don't need type info, so it would be bad if the kernel spent a lot of extra CPU time (or especially I/O time) filling in d_type when it's not cheap. 有时你只是匹配文件名,并且不需要类型信息,所以当内核花费大量额外的CPU时间(或特别是I / O时间)来填充d_type时它会很糟糕。 d_type is just a performance shortcut; d_type只是一个性能快捷方式; you always need a fallback (except maybe when writing for an embedded system where you know what FS you're using and that it always fills in d_type , and that you have some way to detect the breakage when someone in the future tries to use this code on another FS type.) 你总是需要一个后备(除了写一个嵌入式系统,你知道你正在使用什么FS并且它总是填充d_type ,并且你有办法在未来有人试图使用它时检测破损)另一种FS类型的代码。)

Just call system . 只需致电system Globs on Unixes are expanded by the shell. Unix上的Globs由shell扩展。 system will give you a shell. system会给你一个shell。

You can avoid the whole fork-exec thing by doing the glob(3) yourself: 你可以通过自己做glob(3)来避免整个fork-exec的事情:

int ec;
glob_t gbuf;
if(0==(ec=glob("*/", 0, NULL, &gbuf))){
    char **p = gbuf.gl_pathv;
    if(p){
        while(*p)
            printf("%s\n", *p++);
    }
}else{
   /*handle glob error*/ 
}

You could pass the results to a spawned ls , but there's hardly a point in doing that. 您可以将结果传递给衍生的ls ,但这样做几乎没有意义。

(If you do want to do fork and exec, you should start with a template that does proper error checking -- each of those calls may fail.) (如果你确实想做fork和exec,你应该从一个执行正确错误检查的模板开始 - 每个调用都可能失败。)

If you are looking for a simple way to get a list of folders into your program, I'd rather suggest the spawnless way, not calling an external program, and use the standard POSIX opendir / readdir functions. 如果您正在寻找一种简单的方法来获取程序中的文件夹列表,我宁愿建议使用spawnless方法,而不是调用外部程序,并使用标准的POSIX opendir / readdir函数。

It's almost as short as your program, but has several additional advantages: 几乎与您的程序一样短,但还有几个额外的优点:

  • you get to pick folders and files at will by checking the d_type 你可以通过检查d_type选择文件夹和文件
  • you can elect to early discard system entries and (semi)hidden entries by testing the first character of the name for a . 您可以通过测试名称的第一个字符来选择提前丢弃系统条目和(半)隐藏条目.
  • you can immediately print out the result, or store it in memory for later use 您可以立即打印出结果,或将其存储在内存中供以后使用
  • you can do additional operations on the list in memory, such as sorting and removing other entries that don't need to be included. 您可以对内存中的列表执行其他操作,例如排序和删除不需要包含的其他条目。

#include <stdio.h>
#include <sys/types.h>
#include <sys/dir.h>

int main( void )
{
    DIR *dirp;
    struct dirent *dp;

    dirp = opendir(".");
    while ((dp = readdir(dirp)) != NULL)
    {
        if (dp->d_type & DT_DIR)
        {
            /* exclude common system entries and (semi)hidden names */
            if (dp->d_name[0] != '.')
                printf ("%s\n", dp->d_name);
        }
    }
    closedir(dirp);

    return 0;
}

Unfortunately, all solutions based on shell expansion are limited by the maximum command line length. 不幸的是,所有基于shell扩展的解决方案都受到最大命令行长度的限制。 Which varies (run true | xargs --show-limits to find out); 其中有所不同(运行true | xargs --show-limits来查找); on my system, it is about two megabytes. 在我的系统上,它大约是2兆字节。 Yes, many will argue that it suffices -- as did Bill Gates on 640 kilobytes, once. 是的,许多人认为这就足够了 - 比尔盖茨就像640千字节一样。

(When running certain parallel simulations on non-shared filesystems, I do occasionally have tens of thousands of files in the same directory, during the collection phase. Yes, I could do that differently, but that happens to be the easiest and most robust way to collect the data. Very few POSIX utilities are actually silly enough to assume "X is sufficient for everybody".) (在非共享文件系统上运行某些并行模拟时,在收集阶段,我偶尔会在同一目录中拥有数万个文件。是的,我可以这样做,但这恰好是最简单,最强大的方式收集数据。很少有POSIX实用程序实际上足以假设“X对每个人都足够了”。)

Fortunately, there are several solutions. 幸运的是,有几种解决方案。 One is to use find instead: 一种是使用find代替:

system("/usr/bin/find . -mindepth 1 -maxdepth 1 -type d");

You can also format the output as you wish, not depending on locale: 您也可以根据需要格式化输出,而不是取决于区域设置:

system("/usr/bin/find . -mindepth 1 -maxdepth 1 -type d -printf '%p\n'");

If you want to sort the output, use \\0 as the separator (since filenames are allowed to contain newlines), and -t= for sort to use \\0 as the separator, too. 如果要对输出进行排序,请使用\\0作为分隔符(因为允许文件名包含换行符),并使用-t= sort以使用\\0作为分隔符。 tr will convert them to newlines for you: tr会将它们转换为换行符:

system("/usr/bin/find . -mindepth 1 -maxdepth 1 -type d -printf '%p\0' | sort -t= | tr -s '\0' '\n'");

If you want the names in an array, use glob() function instead. 如果您想要数组中的名称,请改用glob()函数。

Finally, as I like to harp every now and then, one can use the POSIX nftw() function to implement this internally: 最后,因为我喜欢nftw()地竖琴,可以使用POSIX nftw()函数在内部实现:

#define _GNU_SOURCE
#include <stdio.h>
#include <ftw.h>

#define NUM_FDS 17

int myfunc(const char *path,
           const struct stat *fileinfo,
           int typeflag,
           struct FTW *ftwinfo)
{
    const char *file = path + ftwinfo->base;
    const int depth = ftwinfo->level;

    /* We are only interested in first-level directories.
       Note that depth==0 is the directory itself specified as a parameter.
    */
    if (depth != 1 || (typeflag != FTW_D && typeflag != FTW_DNR))
        return 0;

    /* Don't list names starting with a . */
    if (file[0] != '.')
        printf("%s/\n", path);

    /* Do not recurse. */
    return FTW_SKIP_SUBTREE;
}

and the nftw() call to use the above is obviously something like 而使用上面的nftw()调用显然是这样的

if (nftw(".", myfunc, NUM_FDS, FTW_ACTIONRETVAL)) {
    /* An error occurred. */
}

The only "issue" in using nftw() is to choose a good number of file descriptors the function may use ( NUM_FDS ). 使用nftw()的唯一“问题”是选择函数可能使用的大量文件描述符( NUM_FDS )。 POSIX says a process must always be able to have at least 20 open file descriptors. POSIX表示进程必须始终能够拥有至少20个打开的文件描述符。 If we subtract the standard ones (input, output, and error), that leaves 17. The above is unlikely to use more than 3, though. 如果我们减去标准值(输入,输出和错误),则会留下17.但是上面不太可能使用超过3个。

You can find the actual limit using sysconf(_SC_OPEN_MAX) , and subtracting the number of descriptors your process may use at the same time. 您可以使用sysconf(_SC_OPEN_MAX)找到实际限制,并减去进程可能同时使用的描述符数。 In current Linux systems, it is typically limited to 1024 per process. 在当前的Linux系统中,每个进程通常限制为1024个。

The good thing is, as long as that number is at least 4 or 5 or so, it only affects the performance: it just determines how deep nftw() can go in the directory tree structure, before it has to use workarounds. 好消息是,只要该数量至少为4或5左右,它只会影响性能:它只是确定nftw()在目录树结构中的深度,然后才能使用变通方法。

If you want to create a test directory with lots of subdirectories, use something like the following Bash: 如果要创建包含许多子目录的测试目录,请使用以下Bash:

mkdir lots-of-subdirs
cd lots-of-subdirs
for ((i=0; i<100000; i++)); do mkdir directory-$i-has-a-long-name-since-command-line-length-is-limited ; done

On my system, running 在我的系统上,运行

ls -d */

in that directory yields bash: /bin/ls: Argument list too long error, while the find command and the nftw() based program all run just fine. 在该目录中产生bash: /bin/ls: Argument list too long错误,而find命令和基于nftw()的程序都运行得很好。

You also cannot remove the directories using rmdir directory-*/ for the same reason. 您也无法使用rmdir directory-*/删除rmdir directory-*/出于同样的原因。 Use 采用

find . -name 'directory-*' -type d -print0 | xargs -r0 rmdir

instead. 代替。 Or just remove the entire directory and subdirectories, 或者只删除整个目录和子目录,

cd ..
rm -rf lots-of-subdirs

Another less low-level approach, with system() : 另一种不太低级的方法,使用system()

#include <stdlib.h>

int main(void)
{
    system("/bin/ls -d */");
    return 0;
}

Notice with system() , you don't need to fork() . 请注意system() ,你不需要fork() However, I recall that we should avoid using system() when possible! 但是,我记得我们应该尽可能避免使用system()


As Nomimal Animal said, this will fail when the number of subdirectories is too big! 正如Nomimal Animal所说,当子目录的数量太大时,这将失败! See his answer for more... 看到他的答案更多......

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM