简体   繁体   中英

Why is the size of a copied git folder on exFAT bigger than its original on ext4?

I have a git folder (which is a git repo) on an ext4 disk:

ytsen@ytsen-MacBookPro:~$ du -hcs ~/git
3,2M    /home/ytsen/git
3,2M    total

now I copy this to a USB stick with an exFAT file system:

ytsen@ytsen-MacBookPro:~$ cp -r ~/git /media/ytsen/USB\ Flash/git3
cp: preserving permissions for ‘/media/ytsen/USB Flash/git3/hooks’: Function not implemented
cp: preserving permissions for ‘/media/ytsen/USB Flash/git3/objects/49’: Function not implemented
cp: preserving permissions for ‘/media/ytsen/USB Flash/git3/objects/68’: Function not implemented

<snip>

and now the size on the stick of this folder is much bigger:

ytsen@ytsen-MacBookPro:~$ du -hcs /media/ytsen/USB\ Flash/git3
23M /media/ytsen/USB Flash/git3
23M total

Question: Why is this, should I worry?

Question: Do I now have a problem with the file permissions (see the generated output of cp)?

PS. I can check out from the copy on the USB stick and there seems to be no problem at all in retrieving the files or history ...

Cluster Sizes and Slack Space

The amount of space used by a file includes more than just the file's bytes. In ext2/3/4 and FAT-based filesystems, each file takes up at least one block/cluster * , and each block/cluster belongs to at most one file. So any remaining space in the cluster that's not part of the file's contents, is basically wasted. The common term for this wasted space, is "slack space".

How much space is wasted, depends in part on how big the clusters/blocks are. Generally, the smaller you expect your files to be, the smaller you want your clusters, because small clusters means less slack space.

A FAT filesystem includes a "file allocation table" (for which it is named), though, which says which clusters are occupied by which files. The FAT has an entry for each cluster on the drive; if the clusters are half as big, there are twice as many of them, and thus the FAT ends up with twice as many entries to manage. So the system tends to favor big clusters (16 KiB, 32 KiB, maybe even higher -- exFAT allows up to 32 MiB/cluster, though that's probably quite uncommon).

In ext4, on the other hand, things are done differently -- in a way that's less sensitive to the size/count of blocks. So it doesn't mind small blocks as much, and will often have blocks 1, 2, or 4 KiB in size. (Big filesystems might have larger blocks, but at that point, space probably isn't an issue.)

With all that said, a 100-or-so byte file might easily occupy 4 KiB on an ext4 filesystem, and 32 KiB on exFAT. So if you have lots of small files, you'll notice a huge increase in space usage when you move those files from a FS with small blocks to a FS with much bigger ones.

Preserving permissions (or...not)

Some filesystems (including most previous FAT filesystems) don't support *nix-style permissions. Others do, but the driver hasn't been written to take advantage of them. In those cases, typically the system will either approximate the permissions as closely as feasible using the FS's own built-in features, or just say "screw it" and make the files accessible by anyone who has access to the device.

(With exFAT, i'm not 100% sure which will happen...but i'd put my money on the latter.)

Either way, Git typically doesn't care all that much; it just cares that it can read and write the files in the repo (and particularly in the .git folder). If you have enough access to copy the files and actually see them afterward, you should be fine.

(One caveat, though. I'm not sure whether -- or how well -- exFAT handles symlinks. If your repo contains any, i'm not sure what will happen.)

As for why only those three particular directories gave you problems: I would venture a guess that you made a couple of commits and/or fetches under a different username (maybe you said sudo git fetch origin or the like, for example), and the files/directories created as part of doing so are owned by that user. You might want to check those directories in the original and see what's different about them -- it's entirely possible that not preserving their permissions is a good thing here, and actually made the exFAT copy more correct than the original.


* "Blocks" and "clusters" are basically the same thing. But MS -- and thus any documentation it provides on FAT-based filesystems -- likes to call them "clusters".

git严重依赖于exFAT不支持的硬链接。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM