简体   繁体   中英

Python: os.stat().st_size gives different value than du

I'm creating a utility that will walk through directories and get the sizes of child directories and files for all directories and store the value. However, the sizes aren't computed correctly.

Here's my class, which automatically recurses through all sub-directories:

class directory:
    '''
    Class that automatically traverses directories
    and builds a tree with size info
    '''
    def __init__(self, path, parent=None):

        if path[-1] != '/':
            # Add trailing /
            self.path = path + '/'
        else:
            self.path = path
        self.size = 4096
        self.parent = parent
        self.children = []
        self.errors = []
        for i in os.listdir(self.path):
            try:
                self.size += os.lstat(self.path + i).st_size
                if os.path.isdir(self.path + i) and not os.path.islink(self.path + i):
                    a = directory(self.path + i, self)
                    self.size += a.size
                    self.children.append(a)
            except OSError:
                self.errors.append(path + i)

I have a directory of videos that I'm testing this program with:

>>> a = directory('/var/media/television/The Wire')
>>> a.size
45289964053

However, when I try the same with du, I get

$ du -sx /var/media/television/The\ Wire
44228824

The directories don't contain any links or anything special.

Could someone explain why os.stat() is giving weird size readings?

Platform:

  • Linux (Fedora 13)
  • Python 2.7

Consider this file foo

-rw-rw-r-- 1 unutbu unutbu 25334 2010-10-31 12:55 foo

It consists of 25334 bytes.

tune2fs tells me foo resides on a filesystem with block size 4096 bytes:

% sudo tune2fs -l /dev/mapper/vg1-OS1
...
Block size:               4096
...

Thus, the smallest file on the filesystem will occupy 4096 bytes, even if its contents consist of just 1 byte. As the file grows larger, space is allocated in 4096-byte blocks.

du reports

% du -B1 foo
28672   foo

Note that 28672/4096 = 7. This is saying that foo occupys 7 4096-byte blocks on the filesystem. This is the smallest number of blocks needed to hold 25334 bytes.

% du foo
28  foo

This version of du is just reporting 28672/1024 rounded down.

du gives the size on disk by default, versus the actual file size as given in st_size .

$ du test.txt
    8    test.txt

$ du -b test.txt
    6095 test.txt


>>> os.stat('test.txt').st_size
6095

On linux (I am using CentOS), 'du -b' will return in bytes and will activate --apparent-size thus returning the size of the file rather than the amount of disk space it is using. Try that and see if that agrees with what Python os.stat says.

I would write this code as:

import os, os.path

def size_dir(d):
    file_walker = (
        os.path.join(root, f)
        for root, _, files in os.walk(d)
        for f in files
    )
    return sum(os.path.getsize(f) for f in file_walker)

If you want to count directories as 4k, then do something like this:

import os, os.path

def size_dir(d):
    file_walker = (
        os.path.join(root, f)
        for root, _, files in os.walk(d)
        for f in files
    )
    dir_walker = (
        4096
        for root, dirs, _ in os.walk(d)
        for d in dirs
    )
    return 4096 + sum(os.path.getsize(f) for f in file_walker) + sum(size for size in dir_walker)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM