简体   繁体   中英

how to stop tracking files in git for sometime?

I want git to stop tracking some files in git for generating pull requests in bitbucket. I am not an administrator so cannot do the changes in bitbucket. Is there a good way to do it? I understand I can use "git rm --cached ", but i want it for whole directories and some files which I would like to set it back again to normal after pull request has been done. I changed.gitignore file, but that's only for untracked. So, its still showing the changes. Please help.

A tracked file, in Git, is a file that is in Git's index.

That's the entire definition right there. So git rm --cached , which removes a file from Git's index, will indeed make it not be tracked. But that's probably not what you want, because Git makes new commits from whatever is in Git's index at the time you run git commit . Unless you're not making any new commits, the fact that these new commits omit the file means that, as compared to any commit that has the file, the file has been "deleted".

Hence, removing the file from the index is only an option if you're not going to make new commits—in which case, whether the file is tracked or not is probably not relevant in the first place.

The trick here is to understand that you do not actually want to stop tracking the file . Instead, you want, in new commits you make, to use the same file that you had in some previous commit , regardless of how the file looks in your working tree . That's pretty easy to do, but before you can understand how this works, we need to define a lot of the jargon terms I just used.

What to know about a Git commit

A commit, in Git:

  • is numbered, though the number looks random;
  • contains a snapshot of every file (as found in Git's index: more on this below); and
  • contains some metadata.

The commit's number is a weird, random-looking (but not actually random) and very large number (between 1 and 2 160 -1, or about 1.4 quindecillion in the short scale system ), normally expressed in hexadecimal . Every commit in every Git repository gets a unique number, which is why these have to be so big: any commit you make will have a different number than every other commit ever . This is how Git can tell if two different repositories have the same commit: two commits are only the same if they have the same number .

The snapshot in a commit is the complete set of all files, as seen in Git's index. These files are stored in a special, Git-only, compressed and de-duplicated format. The de-duplication takes care of the fact that most commits actually share most of their files with other commits: these files are only actually stored once.

The metadata in a commit contains things like the name and email address of the person who made the commit, and the date-and-time of when they made that commit. It also contains the raw commit number of some earlier commit (as long as there is some earlier commit). This is what glues commits together.

Since the actual hash IDs—the commit numbers—of real commits are so big and ugly, let's draw a tiny repository, with just three commits in it, by pretending that the commit's "numbers" are A , B , and C , in that order. Commit A , being the very first commit ever, doesn't have any previous commit, so it just sits there:

A

Commit B , however, has commit A as its previous commit, so it points backwards to A :

A <-B

Commit C likewise points backwards, but to B :

A <-B <-C

Each commit stores a full snapshot of every file, so by extracting—say—commits B and C and comparing the files thus extracted, Git can tell you what changed between B and C .

All parts of any commit, once made, are completely and totally read-only. No part of any commit can ever be changed, Commits are mostly permanent (they only go away, eventually, if you can't find them—we'll say more about this in a moment) and are totally read-only.

Branch names

Now, because commit hash IDs are so big and ugly, we (humans) never want to have to use them. So to help us out, Git gives us branch names . A branch name is just a way to store the hash ID of one (1) commit. We say that the branch name points to the commit, and I like to draw that like this:

A--B--C   <-- main

Note that I get lazy about drawing the arrows between commits. They actually point backwards (from C back to B , then from B to A ) but humans tend to like to pretend they go forwards. This mostly doesn't matter so it's mostly OK to not bother drawing them in properly anyway, but you should remember that Git actually works backwards. (It helps out with other things later. We won't need to worry about it here though.)

The name main (or master ) currently selects commit C . But this isn't permanent: if we git checkout main , we select commit C . Then, we do various things to get ready to make a new commit.

When we do go to make a new commit by running git commit , Git will:

  1. package up all the files in Git's index (again, more about this in a moment) into a new mostly-permanent snapshot, all de-duplicated as appropriate;
  2. add your name and email address and other metadata as needed;
  3. use the current commit's raw hash ID as the parent for the new commit, so that the new commit points back to the current commit;
  4. write all this out so as to make the new commit, which also acquires the new commit's new hash ID; and finally
  5. write this new commit's hash ID into the current branch name .

Since there can be multiple branch names in a repository, Git remembers which name is the current branch name by attaching the special name HEAD to just one branch name.

Let's draw this. Suppose we have:

A--B--C   <-- main (HEAD)

Now we create a new branch name, such as dev for develop. This branch name must point to some existing commit. Which one should we choose? We can pick any existing commit, A through C , but the obvious one to choose right now is the current one, C :

A--B--C   <-- dev, main (HEAD)

Note how both branch names pick the same commit . So if we now switch the current branch to commit C , like this:

A--B--C   <-- dev (HEAD), main

nothing else has to change, and in fact, nothing else does change.

Now we prepare a new commit—in the way we'll think and talk more about in a moment—and we run git commit . Git packages up the files, adds the metadata, and makes a new commit D that points back to existing commit C :

A--B--C
       \
        D

What happens with the branch names? We just said: Git writes the new commit's hash ID into the current branch name , the one HEAD is attached to. Nothing else happens to any other branch name. So now we have:

A--B--C   <-- main
       \
        D   <-- dev (HEAD)

Commit D is our new commit; commits ABC are our existing commits; D points back to C ; and main still points to C because it has not moved, but dev now points to our new commit D because our current branch is dev .

If you git checkout main or git switch main , Git will remove the files that go with commit D and extract the files that go with commit C . That's because by changing branches, you've changed which commit you're using:

A--B--C   <-- main (HEAD)
       \
        D   <-- dev

If you want the commit- D files back, you git checkout dev , which currently means commit D . The branch names move! The commits don't move: they're read-only (completely) and permanent (mostly). We find them using the branch names, and then—if needed—working backwards.

Let's check out dev again and make one more commit, then switch back to main , then make a new branch feature and switch to feature :

        D--E   <-- dev
       /
A--B--C   <-- feature (HEAD), main

I drew dev on top this time just because I want our new commits to go on the bottom row. The names feature and main both select commit C right now, and the files we have to work with are those from commit C . We make some changes and add and commit and get a new commit F :

        D--E   <-- dev
       /
A--B--C   <-- main
       \
        F   <-- feature (HEAD)

If we make another commit, we end up with:

        D--E   <-- dev
       /
A--B--C   <-- main
       \
        F--G   <-- feature (HEAD)

All we ever really do is add commits . The existing commits remain. We can check them out—by name, with git checkout main for instance, or by raw hash ID (resulting in what Git calls a detached HEAD )—and we can make new commits. We can never change any existing commit.

You now know what branches are in Git. But there's more to learn.

Git's index and your working tree

You have made commits before, so you know that you run git checkout , edit some file(s), run git add , and then run git commit . (Or, maybe you used the slightly sleazy short-cut, git commit -a .) Your Git tutorial(s) may have mentioned the index or the staging area . Unless they were pretty good tutorials, though, they probably didn't cover it properly.

Covering this thing properly is essential, because the index is the key to everything here. Of course, the name index is pretty poor. It's so bad, in fact, that Git has two other names for it. The name staging area , which might be what your tutorials used, is another name for the index. The third name—pretty rare these days—is the cache; this name mostly turns up in flags like git rm --cached . All three names refer to the same thing. But: what is this thing?

We already mentioned that the files inside a commit are stored in a special, compressed, read-only, Git-only, de-duplicated format. Only Git can read these files, and literally nothing can write them, not even Git itself. (Git can create new ones, but not overwrite existing ones. The hash ID tricks that Git uses require this, so even if it's physically possible to overwrite them—the OS usually doesn't prevent it—that just breaks the repository, and then everyone is sad and out of work and we starve to death, or something else miserable out of a Chinese film.)

How, then, can we ever get any work done? To get work done, we need ordinary read/write files, that we can read and write. So Git will extract the files from a commit. Indeed, pretty much all version control systems work this way: you take the committed files out of the VCS and turn them into regular files that you then work with.

This explains your working tree . Your working tree is where you have your files, to work with however you like. Once the files are out of Git, Git is hands-off on these things. Those are now your files, unless and until you tell Git do something with them, like replace them with files from some other commit.

But what about the index? Why is there this mysterious "staging area" thing? Other version control systems don't seem to have one—and indeed, most don't, or at least don't whack you in the face with it, the way Git does. So it's not necessary , but Git has it, and Git will whack you in the face with it. You'd better know about it.

What Git does with this staging area is pretty simple:

  • When you check out a commit, Git copies the commit's files into the staging area . These files are in the special read-only de-duplicated format, so they don't actually take any space (though there's a modest amount of space per file, on the order of 100 bytes or so, for cache data that Git wants here).

  • These files, being in the compressed-and-de-duplicated format, are ready to go into the next commit . That is, the staging area, right now, at this point, consists of the current commit , which is ready to be the next commit.

  • Git also copies the files from the commit/staging-area to your working tree, expanding them into normal usable files, so that you can work with them.

Once you've worked with a file and it's ready to be updated, Git makes you run git add on the file. This:

  • reads the working tree copy, compressing it and turning it into the internal Git form; and
  • checks to see if it's a duplicate.

If it is a duplicate, Git then puts into the index the duplicate "copy", which takes no space, just as before. This file is now ready to be committed.

If it's not a duplicate, Git puts the ready-to-go file into the index (though technically Git puts it elsewhere and puts just a reference to it into the index, so that it looks pretty much the same as if it were a duplicate).

Either way, this file is now ready to be committed —along with every other file that's already in the index. The old version of the file, if there was one, has been booted out of the index; now the current version is ready to commit.

So git add just copies the file into Git's index , making it ready to commit. The index continues to hold the proposed next commit . This means that, at all times, the index holds your proposed next commit . That's what the index really is, mostly: your proposed next commit. (During conflicted merges, the index takes on an expanded role, but Git also won't let you commit, so it's still sort of your proposed but not-commit-able commit even then. We won't go into further detail here, so we can just leave it at "proposed next commit".)

When you do run git commit , Git simply packages up whatever is in the index at that time . If you forget to git add a modified file, Git will commit whatever is in the index under that name—if anything—because what's in the index is still the old version of the file. And, if you don't git add a file on purpose, the same thing happens.

It's hard to see what's in the index directly, but it's easy to compare what's in the index to other things:

  • git diff --cached compares what's in the current commit to what's in the index right now . You can use git diff --staged as a synonym, if you like the term staging area better (it is a better term than cache ).

    Hence, if something doesn't show up as different here, the index copy of the file matches the committed copy of the file.

  • git diff compares what's in the index right now to what's in the working tree right now . That is, if something shows up here, the index copy of the file doesn't match the working tree copy. The diff shows you what's different.

  • The git status command runs both of these two git diff commands for you, but with --name-status so as to not show the actual changes . Instead, the first diff— HEAD -vs-index—tells you what files if any would be different if you committed right now. The status command calls these files staged for commit . The second diff, index-vs-working-tree, tells you what files you could git add if you wanted; the status command calls these files not staged for commit .

Pull requests

Pull requests are, unfortunately, a little bit complicated. To understand them properly, we need to realize that Git commits have to be passed around .

Suppose you have, in your Git repository, the following chain of commits:

...--F--G--H   <-- main
            \
             I--J   <-- feature (HEAD)

But your Git repository is a clone of some other Git repository, perhaps over on GitHub. Their repository has commits—which you got from them—and branch names.

Your Git doesn't use their Git's branch names. Your Git takes their Git's branch names and renames them . That's because your branch names are yours , as your Git moves them around by adding new commits. We cannot (and don't want to) move their names around while we add our own commits, after all. So, if your Git—by "your Git" I mean your Git software, running on your laptop for instance, using your repository—has commit H as your latest main -branch commit, and their Git (their software with their repository, over on GitHub) has commit H as their latest main -branch commit, your own repository will actually have this:

...--F--G--H   <-- main, origin/main
            \
             I--J   <-- feature (HEAD)

The name origin/main is a remote-tracking name . It just remembers where their branch name points. Your Git will update this whenever your Git gets updates from their Git.

Now, if you made commits I and J yourself, and you run:

git push origin feature

your Git will call up their Git and tell them the hash ID for commit J . They'll look in their repository and find that they don't have this commit (because you made it). Your Git is now obligated to offer them commit I too; they won't have that one and will ask for it. Your Git must now offer them commit H , but this time they do have the commit, so they'll say no thanks, I have that one already . This enables your Git to stop offering commits, because the fact that they have H means they also have every earlier commit .

Your Git now packages up I and J and any files and other supporting objects needed for these new commits and sends them over. Then your Git will ask their Git to create or update a branch name feature in their repository. If that goes well, they now have the name feature selecting commit J —which you and they now share —and that looks like this in your repository:

...--F--G--H   <-- main, origin/main
            \
             I--J   <-- feature (HEAD), origin/feature

In their repository, they only have (their) main pointing to H , and (their) feature pointing to J : they don't have these special remote-tracking names that your Git uses; your git push has them set their branch names directly. Hence, sometimes they'll refuse a push, perhaps because they already have that branch name, and they're using it to remember some other important commit hash ID.

Anyway, assuming they do accept this push, you can now, over on GitHub, generate a pull request using their branch name feature . That will ask someone to look over these new commits and, if they (the someone) like them, incorporate those new commits into their branch(es) somehow. We'll ignore, for now, the how and all the possible consequences, as these get very complicated.

What if you put files into your commits that you should not have?

Suppose that, in commits I and/or J , you updated some file you didn't mean to update.

You literally can't take this back. Commits I and J are stuck the way they are, forever. But you don't need to. Just make a new branch name that points to commit H and get on it:

...--F--G--H   <-- main, origin/main, refeature (HEAD)
            \
             I--J   <-- feature, origin/feature

Now use Git's facilities, such as git cherry-pick -n , to grab updates from commit I but not commit them yet . Then use the git restore command to put the index version of the file or files back, and make a new commit with git commit :

             K   <-- refeature (HEAD)
            /
...--F--G--H   <-- main, origin/main
            \
             I--J   <-- feature, origin/feature

Repeat with commit J as appropriate:

             K--L   <-- refeature (HEAD)
            /
...--F--G--H   <-- main, origin/main
            \
             I--J   <-- feature, origin/feature

Close the bad pull request, have GitHub delete their feature ( git push origin --delete feature ), and push refeature and make a new pull request. Decide whether or not to keep your own feature branch; rename it if you like; do whatever you want with it. It's your branch, not anyone else's.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM