简体   繁体   中英

How does git detect modified files on Windows?

(This is not a duplicate of How does git detect that a file has been modified? because I'm asking about Windows, the referenced QA mentions stat and lstat , which do not apply to Windows).

With traditional systems like SVN and TFS, the "state database" needs to be explicitly and manually informed of any changes to files in your local workspace: files are read-only by default so you don't accidentally make a change without explicitly informing your SVN/TFS client first. Fortunately IDE integration means that operations that result in the addition, modification, deletion and renaming (ie "checking-out") of files can be automatically passed on to the client. It also means that you would need something like TortoiseSVN to work with files in Windows Explorer, lest your changes be ignored - and that you should regularly run an often lengthy Server-to-Local comparison scan to detect any changes.

But Git doesn't have this problem - on my Windows machine I can have a gigabyte-sized repo with hundreds of thousands of files, many levels deep, and yet if I make a 1 byte change to a file nested very deeply, I can see that Git knows after running git status . This is the strange part - because git does not use any daemon processes or background tasks - running git status also does not involve any significant IO activity that I can see, I get the results back immediately, it does not thrash my disk searching for the change I made.

Additionally, Git GUI tools, such as the Git integration with Visual Studio 2015 also have some degree of magic in them - I can make a change in Notepad or another program, and VS' Git Changes window picks it up immediately. VS could simply be using ReadDirectoryChanges ( FileSystemWatcher ) - but when I look at the devenv process in Process Explorer I don't see any corresponding handles, but that also doesn't explain how git status sees the changes.

Git runs a Windows equivalent of the POSIX-y lstat(2) call on each file recorded in the index to have the first stab at figuring out whether the file is modified or not. It compares the modification time and size taken from that information with the values recorded for that file in the index.

This operation is notoriously slow on NTFS (and network-mapped drives) so since some time Git for Windows gained a special tweak controlled with the core.fscache configuration option which became enabled by default some 2 or 3 GfW releases ago. I don't know the exact details but it tries to minimize the number of times Git needs to lstat(2) your files.

IIUC, the mechanism enabled by core.fscache is not making use of filesystem watching Win32 API as Git runs no daemons/services on your system; so it merely optimizes the way Git asks the filesystem layer about the stat info of the tracked files.

As Briana Swift and kostix point out - it is scanning your disk. However, when looking for unstaged changes, it does not need to read every file on your disk. Instead, it can look at the metadata stored in the index to determine what files to examine more closely (actually reading them).

If you use the git-ls-files command to examine the index, you can see this metadata:

% git ls-files --debug worktree.c
worktree.c
  ctime: 1463782535:0
  mtime: 1463782535:0
  dev: 16777220 ino: 120901250
  uid: 501      gid: 20
  size: 5591    flags: 0

Now if you run git status , git will look at worktree.c on disk. If the timestamps and filesize match, then git will assume that you have not changed this file.

If, however, the timestamps and filesize do not match, then git will look more closely at the file to determine if you have changed it or not.

So git does "thrash" the disk, but in a much more limited manner than if you did something like tf reconcile to examine your changes. (TFVC, of course, was designed to deal with very large working trees and should never touch your disk if you're using it correctly.)

And yes - Visual Studio does have some magic in it. It runs a background filesystem watcher in both your working directory and some parts of the Git repository. When it notices a change in your working directory, it will re-compute the git status . It also looks at changes to branches in the Git repository to know when you've switched branches or to recompute the status of your local repository with your remote.

Git's process of git status is very lightweight.

git status checks the index (also known as staging area, before you run git add ) and the working directory (after git add but before git commit ), then compares those files with the last committed version. Instead of having to go through every file in the repository, Git first checks these areas to see what to look up in the most recent commit.

git diff works similarly. I suggest looking here for more information.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM