简体   繁体   English

由statvfs()为文件系统计算的已用空间大于fs中所有文件的大小总和

[英]Used space calculated by statvfs() for a file system is greater than the sum of the sizes of all files in the fs

I have a little partition of 50MiB, formatted as ext4, with only one directory that contains a set of photos, mounted on /mnt/tmp. 我有一个50MiB的小分区,格式为ext4,只有一个包含一组照片的目录,安装在/ mnt / tmp上。

Then I use statvfs() for calculate the used bytes in the partition, and lstat() for calculate the size of every file inside, for this I wrote this program: 然后我使用statvfs()来计算分区中使用的字节,并使用lstat()来计算内部每个文件的大小,为此我编写了这个程序:

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <sys/statvfs.h>
#include <stdint.h>
#include <string.h>
#include <dirent.h>
#include <stdlib.h>

//The amount of bytes of all files found
uint64_t totalFilesSize=0;

//Size for a sector in the fs
unsigned int sectorSize=0;

void readDir(char *path) {
    DIR *directory;
    struct dirent *d_file;  // a file in *directory

    directory = opendir (path);

    while ((d_file = readdir (directory)) != 0)
    {
        struct stat filestat;
        char *abPath=malloc(1024);
        memset(abPath, 0, 1024);
        strcpy(abPath, path);
        strcat(abPath, "/");
        strcat(abPath, d_file->d_name);

        lstat (abPath, &filestat);

        switch (filestat.st_mode & S_IFMT)
        {
        case S_IFDIR:
        {
            if (strcmp (".", d_file->d_name) && strcmp ("..", d_file->d_name))
            {
                printf("File: %s\nSize: %d\n\n", abPath, filestat.st_size);

                //Add slack space to the final sum
                int slack=sectorSize-(filestat.st_size%sectorSize);

                totalFilesSize+=filestat.st_size+slack;

                readDir(abPath);
            }
            break;
        }
        case S_IFREG:
        {
            printf("File: %s\nSize: %d\n\n", abPath, filestat.st_size);

            //Add slack space to the final sum
            int slack=sectorSize-(filestat.st_size%sectorSize);

            totalFilesSize+=filestat.st_size+slack;

            break;
        }
        }

        free(abPath);
    }

    closedir (directory);
}

int main (int argc, char **argv) {

    if(argc!=2) {
        printf("Error: Missing required parameter.\n");
        return -1;
    }

    struct statvfs info;
    statvfs (argv[1], &info);

    sectorSize=info.f_bsize; //Setting global variable

    uint64_t usedBytes=(info.f_blocks-info.f_bfree)*info.f_bsize;

    readDir(argv[1]);

    printf("Total blocks: %d\nFree blocks: %d\nSize of block: %d\n\
Size in bytes: %d\nTotal Files size: %d\n",
            info.f_blocks, info.f_bfree, info.f_bsize, usedBytes, totalFilesSize);

    return 0;
}

Passing the mount point of the partition as parameter (/mnt/tmp), the program shows this output: 将分区的挂载点作为参数(/ mnt / tmp)传递,程序显示以下输出:

File: /mnt/tmp/lost+found
Size: 12288

File: /mnt/tmp/photos
Size: 1024

File: /mnt/tmp/photos/IMG_3195.JPG
Size: 2373510

File: /mnt/tmp/photos/IMG_3200.JPG
Size: 2313695

File: /mnt/tmp/photos/IMG_3199.JPG
Size: 2484189

File: /mnt/tmp/photos/IMG_3203.JPG
Size: 2494687

File: /mnt/tmp/photos/IMG_3197.JPG
Size: 2259056

File: /mnt/tmp/photos/IMG_3201.JPG
Size: 2505596

File: /mnt/tmp/photos/IMG_3202.JPG
Size: 2306304

File: /mnt/tmp/photos/IMG_3204.JPG
Size: 2173883

File: /mnt/tmp/photos/IMG_3198.JPG
Size: 2390122

File: /mnt/tmp/photos/IMG_3196.JPG
Size: 2469315

Total blocks: 47249
Free blocks: 19160
Size of block: 1024
Size in bytes: 28763136
Total Files size: 23790592

Note at the last two lines. 请注意最后两行。 In a FAT32 file system, the amount is the same, but in ext4 differs. 在FAT32文件系统中,数量相同,但在ext4中有所不同。

So question is: Why? 所以问题是:为什么?

statvfs() is a filesystem-level operation. statvfs()是一个文件系统级操作。 The space used will be calculated from the point of view of the filesystem. 使用的空间将从文件系统的角度计算。 Therefore: 因此:

  1. It will contain any filesystem structures: For filesystems based on the traditional design from Unix, that includes the inodes and any indirect blocks . 它将包含任何文件系统结构:对于基于Unix的传统设计的文件系统,包括inode和任何间接块

    On some of my systems I typically have a 256-byte inode per 32KB of space for the root partition. 在我的一些系统中,对于根分区,每32KB空间通常有一个256字节的inode。 Smaller partitions may have even higher inode density, to provide sufficient inodes for a large number of files - I believe that the mke2fs default is one inode per 16KB of space. 较小的分区可能具有更高的inode密度,为大量文件提供足够的inode - 我相信mke2fs默认是每16KB空间一个inode。

    Creating an 850 MB Ext4 filesystem with the default options results in a filesystem with about 54,000 inodes that consume over 13MB of space. 使用默认选项创建850 MB Ext4文件系统会导致文件系统中包含大约54,000个inode,占用的空间超过13MB。

  2. For Ext3/Ext4 that will also include the journal, which has a minimum size of 1024 filesystem blocks. 对于Ext3 / Ext4,它还包括日志,其最小大小为1024个文件系统块。 For the common block size of 4KB that is a minimum of 4MB per filesystem . 对于4KB的公共块大小, 每个文件系统至少为4MB。

    An 850 MB Ext4 filesystem will have a 16MB journal by default. 默认情况下,850 MB Ext4文件系统将具有16MB日志。

  3. The result from statvfs() will also include any deleted, yet still open, files - this often happens on partitions housing tmp directories for use by applications. statvfs()的结果还将包括任何已删除但尚未打开的文件 - 这通常发生在包含tmp目录的分区以供应用程序使用。

  4. To see the actual space used by a file with lstat() , you need to use the st_blocks field of the stat structure and multiply with 512. Judging by the sizes displayed in your program output, you are using the st_size field which is the exact file size in bytes. 要使用lstat()查看文件使用的实际空间,您需要使用stat结构的st_blocks字段并乘以512.根据程序输出中显示的大小判断,您使用的是st_size字段,它是精确的文件大小(字节)。 This will typically be smaller than the actual space used - a 5KB file will actually use 8KB on a filesystem with 4KB blocks. 这通常小于实际使用的空间 - 在具有4KB块的文件系统上,5KB文件实际上将使用8KB。

    Conversely, a sparse file will use less blocks than what is indicated by its file size. 相反,稀疏文件将使用比其文件大小指示的更少的块。

As such, the additional space usage mentioned above will add-up to rather noticeable amounts, which explain the discrepancy that you are seeing. 因此,上面提到的额外空间使用量将累计到相当明显的数量,这可以解释您所看到的差异。

EDIT: 编辑:

  1. I just noticed the slack space handling in your program. 我刚刚注意到程序中的松弛空间处理。 Although that is not the recommended way to calculate the actual used space (as opposed to the apparent one), it seems to work, so you are not missing space there. 虽然这不是推荐的计算实际使用空间的方法(而不是明显的空间),但它似乎有用,所以你不会错过那里的空间。 On the other hand, you are missing the space used for the root directory of the filesystem, although that would probably be only a single block or two :-) 另一方面,您缺少用于文件系统根目录的空间,尽管这可能只是一个或两个块:-)

  2. You might want to have a look at the output of tune2fs -l /dev/xxx . 您可能想查看tune2fs -l /dev/xxx的输出。 It lists several relevant numbers, including space reserved for filesystem metadata. 它列出了几个相关的数字,包括为文件系统元数据保留的空间。

BTW, most of the functionality in your program can be accomplished using df and du : 顺便说一句,您的程序中的大多数功能都可以使用dfdu来完成:

# du -a --block-size=1 mnt/
2379776 mnt/img0.jpg
3441664 mnt/img1.jpg
2124800 mnt/img2.jpg
12288   mnt/lost+found
7959552 mnt/
# df -B1 mnt/
Filesystem     1B-blocks     Used Available Use% Mounted on
/dev/loop0      50763776 12969984  35172352  27% /tmp/mnt

Incidentally, the Ext4 test filesystem displayed above was created using the default mkfs options on a 50MB image file. 顺便提一下,上面显示的Ext4测试文件系统是使用50MB图像文件上的默认mkfs选项创建的。 It has a block size of 1,024 bytes, 12,824 128-byte inodes which consume 1,603 KB and a 4096-block journal that uses 4,096KB. 它的块大小为1,024字节,12,824个128字节的inode,消耗1,603 KB,4096块的日志使用4,096KB。 A further 199 blocks are reserved for the group descriptor tables, according to tune2fs . 根据tune2fs ,为组描述符表保留了另外199个块。

The inodes are probably not counted, and they may contain some small data. 可能没有计算inode,它们可能包含一些小数据。

If a file is sparse, its size is bigger than what it is actually occupied. 如果文件稀疏,则其大小大于实际占用的大小。

If a file is hard-linked more than once, a common inode is shared. 如果文件多次硬链接,则共享一个公共inode。

A paper about Ext4 is here, by Kumar et al 关于Ext4的论文在这里由Kumar等人提出

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 statvfs()和statfs()系统调用之间的区别? - Difference between statvfs() and statfs() system calls? 如何找到大于该数字之前(之上)所有元素之和的堆栈元素? [等候接听] - How to find an element of stack which is greater than sum of all the elements which are before(above) that number? [on hold] 打印大于平均值的数字总和 - Printing sum of digits greater than average 如果文件夹中的txt文件数大于10,则计算最旧的文件? - Count number of txt files in a folder and oldest file if number of files greater than 10? 在MSVC6中处理大于2 GB的文件! - Handling Files greater than 2 GB in MSVC6! 长度大于8192的base64解码文件? - base64 decoding files of length greater than 8192? 用于所有分配变量名称的空间 - Space used for all allocating the variable name 我可以映射长度大于文件大小的文件吗? - Can I mmap a file with length greater than the size of the file? 在静态 GCC 编译的 C 程序中获取所有函数的详尽列表(除了 .c 文件中使用的函数) - Get an exhaustive list of all the functions (other than the functions used in the .c file) in a statically GCC compiled C program statvfs系统调用失败,错误值对于定义的数据类型而言太大 - statvfs system call fails with error Value too large for defined data type
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM