简体   繁体   中英

How can I recover files deleted from both my remote an local repositories and how can I prevent this loss in the future?

I recently, accidentaly pushed all my project files to a github repo without making a .gitignore file and then added it to github after the fact and deleted the files that would have been ignored in the initial push, had the .gitignore file existed. After doing this I pulled the repo to my local git thinking that I would only get the .gitignore file, however all the files to be ignored (.project, .classpath, *.jar, etc.), that are important for the development I was doing in eclipse were deleted.

How can I recover these lost files and how can I go about adding the .gitignore file in the future without deleting the files.

Thanks for the help.

尝试删除.gitignore文件,将其推送到远程存储库,然后键入

git pull

I see from comments that you already have your files back, which is good and means I don't have to write specific instructions for that (though I'll have a section on it). But: It's important to understand what .gitignore does. It doesn't actually cause files to be ignored! (Which means that .gitignore is not a very good name, but a good name would be really long. Probably it should have been called .gitxyz or .gitconfuse or something, so that it's short and memorable, but you don't think it means "ignore".)

So, what does .gitignore really do? Well, this requires a quick dive into the other part of Git that many tutorials or introductions gloss over, that causes a lot of confusion, and that is Git's index .

Background

First, let's note that what Git mainly does is to store commits . A commit holds a complete snapshot of all of your files—well, all the ones that the commit contains, but that's a bit redundant: "The commit has what the commit has." The important part here is that there's a full copy of every file, frozen and immutable, saved forever (or at least as long as the commit itself continues to exist).

(The commit also contains some metadata—some information like your name and email address as the author/committer, for instance, and your log message. Crucially, each commit also contains the true name—the hash ID—of the previous or parent commit. But we won't go into that here.)

These frozen files, saved permanently and immutably with each commit, would use up a lot of space if they were not compressed, so they are in a special, compressed, Git-only form. And in fact, since they are frozen, if two different commits use the same data for a file, those two commits actually share the frozen copy. This means that the fact that Git keeps putting the same copy of the file into every new commit, takes no space at all, because it's actually just re-using the old file. But in any case, these files are not useful to anything except Git: no other system can read them directly, and nothing—not even Git—can write on them: they're frozen.

So, to let you see and work on your files, Git has to un-freeze all the files that are saved in some commit, into some sort of work area. Git calls this area the work-tree or working tree . That's a good name, because it's where you do your work.

Other, non-Git, version control systems typically stop here: they have the committed files (maybe stored as deltas instead of full files), and the work-tree, and that's it. When you use one of these systems and go to make a new commit, it takes time, sometimes a whole lot of time. Sometimes you could go out and get lunch while you wait. With Git, though, you run git commit -m message and— zip —it's done.

Git gets all this speed from its index . But index is a terrible name for this thing, so Git also calls it the staging area , or sometimes the cache , depending on who / what is doing the calling. What the index does is hold—in a special Git-only form, but this time not frozen—all the files that are going to be in the next commit.

Initially, the index is filled from whichever commit you check out. That is, git checkout <some-commit-specifier> locates the commit, which contains the full set of frozen files. Git copies the frozen files (well, the link to their content) into the index, figuring out the files' full names along the way, so that the index has the list of all the files that Git needs to put in the work-tree. These are now in the special Git-only format, but unfrozen . Git then also puts the files into the work-tree, expanding them into useful format.

The end result is that the index matches the commit, but the index is unfrozen. The work-tree matches both the commit and the index, and of course is unfrozen and files have their useful form. You now do your work as usual—and, this explains why you have to git add your files all the time!

What git add does is to copy the work-tree file into the index. This overwrites the previous copy, if the file was already in the index. The new copy is now in the Git-only format (but still not yet frozen). If the file wasn't in the index before, now it is. In either case, the index is still ready to go. All git commit has to do, besides collecting the metadata like your name and email and log message, is freeze the index.

Hence, the best short description I know of for the index is this: The index contains your proposed next commit. It has all the files in it, in their special Git-only form, but not yet frozen. This is why it's also called the staging area: it has all the files in it, staged and ready to go.

Staged, unstaged, untracked, ignored

Now that you know that there are three copies of each file to worry about, all of this will start to make sense. Let's consider a README.txt file, for instance. You run git checkout master to start, and Git finds the commit for master and checks it out, making that commit the current or HEAD commit:

  • HEAD:README.txt is frozen in the current commit. It will never change—it's part of that commit.

  • :README.txt is copied into the index, and unfrozen in the process. It could change, but it currently matches HEAD:README.txt .

  • :README.txt is copied from the index to the work-tree, expanded into useful form. It could change, but it currently matches :README.txt .

All three copies match, so Git says nothing at all about the file.

If you now change the work-tree copy and run git status , Git's status command compares the HEAD and index copies. They are the same, so it says nothing about this. It compares the index and work-tree copies, and they are different, so git status says the file is not staged for commit .

Once you run git add README.txt , that copies (and compresses) the work-tree version into :README.txt . Now these two match, but HEAD:README.txt and :README.txt are different. So git status compares HEAD vs index and says that the file is staged for commit .

Note that you can change the work-tree copy yet again. Now the file is different in all three versions, and git status tells you that it's both staged for commit (HEAD and index don't match) and not staged for commit (index and work-tree don't match either). This is all based on the result of two git diff s: one from HEAD to index, and one from index to work-tree.

But what happens if you have a file that's in the HEAD commit that you remove from the index and work-tree? Well, now it's removed when comparing HEAD vs index. So Git says that a remove is staged. The index and work-tree match, so Git says nothing about that. In any case, your next commit won't have the file.

What happens if you have a file that's in the work-tree, but not in the index? If it's in the HEAD commit it's still a staged remove: the file won't be in the next commit. But it's also different in the index and work-tree, so it's untracked .

If the file isn't in HEAD and is not in the index, but is in the work-tree, it's untracked .

This tells us what it means to have an untracked file: a file is untracked if and only if it's not in the index right now . Since you can manipulate the index—adding or removing files whenever you like—you can change the tracked-ness of some file at any time, by just adding it to the index, or taking it out of the index.

If a file isn't in the index and isn't in the work-tree, it just doesn't exist. It's only files that are in the work-tree, but are not in the index, that are untracked. You could git add the file, and Git whines at you, nagging you about the file. Listing the file in .gitignore (or in .git/info/exclude ) mainly shuts Git up . It doesn't actually cause the file to be untracked—that's a matter of whether the file is in the index. Once the file is in the index, it's tracked, and .gitignore has no effect. It just keeps git status from nagging you. So instead of .gitignore , maybe this should be .git-dont-complain-about-these-files-if-they-are-untracked .

It also has one other important effect. You can run git add . or git add somedir or git add --all to add a whole bunch of files based on Git searching through the entire list of files in a directory / folder. If you list some files as ignored, git add will skip them if they're not already tracked. That is, a tracked file is definitely in Git, so git add will copy it into the index if it's changed. But an untracked file isn't in Git yet, so an en-masse add will add it if it's not ignored. Here, "ignore" is the right word. So maybe the file should be called .git-dont-complain-about-these-files-if-they-are-untracked-and-dont-auto-add-them-with-an-en-masse-add-operation .

Unfortunately, there's another side effect of listing a file in .gitignore , and that is that you tell Git that it's OK to remove or clobber the file in some cases. So the full proper name for .gitignore might be .git-dont-complain-about-these-files-if-they-are-untracked-and-dont-auto-add-them-with-an-en-masse-add-operation-but-do-feel-free-to-clobber-these-files-sometimes . Imagine if that were the file's name! It would be less confusing, at least.

Drawbacks of removing a file with git rm --cached

As we saw above, if you want to make some file untracked, you must take it out of the index. If you use git rm --cached filename , Git will take the file out of the index (so now it's untracked), but not remove the file from the work-tree (so it's still where you can use it). Your next commit won't have the file, which is what you want.

But all the old commits, the frozen forever in time commits, that do have the file ... all those old commits still exist. If you ever check out one of those commits, Git will have to copy the frozen file into the index, and then copy the index copy into the work-tree. That will clobber your work-tree version. Is that OK? Git's answer to that is to check for the file in .gitignore !

If the file isn't listed in a .gitignore , Git won't feel free to clobber it. But you'll get complaints about it being untracked. To solve those complaints, you'll probably list the file in .gitignore . Then, checking out an old commit will clobber your files with the extracted, unfrozen, uncompressed ones—and then going back to the new commit will remove the files, because they're now the same as the frozen ones, so it's "safe".

Git needs, but does not have, a way to list a work-tree file as shut up about this, but never clobber it . If and when this is ever added, that will let you handle your situation. But it's best not to get into the situation at all, if possible.

Recovering what you can

Meanwhile, if you do lose your files, remember that at least there's some version(s) of them in the old frozen commits. You can extract those old frozen versions. There are a couple of ways to do this:

  1. git show : run git show commit : path , eg, git show v1.0:README.txt or git show a123456:path/to/file.ext . This expands the frozen, saved file to standard output, so you can save it with I/O redirection: git show v1.0:README.txt > README.txt.old , for instance.

  2. git checkout has a mode where, instead of checking out a whole commit, it populates part of your index and work-tree from some existing commit. (This command probably should never have been called git checkout , since it's quite destructive if you have unsaved changes, so be careful with it.) Running git checkout v1.0 -- README.txt or git checkout a123456 -- path/to/file.ext will extract the named file from the named commit, copying (unfreezing) it into your index—so now your next commit will have that version of that file—and then on into your work-tree.

    This one is more useful if you have a whole directory of files to recover, or a glob pattern like *.jar , because you can git checkout the directory or pattern:

     git checkout HEAD~2 -- '*.jar' 

    Here HEAD~2 is the commit to use (two steps back from the current commit along the first-parent chains), and *.jar , which needs quoting to protect it from the shell if there are any *.jar files in the current directory, is the pathspec that Git should match. (I think this should be equivalent to **/*.jar but if not, that is also a valid pathspec.) Since this populates your index, you will have to undo it afterward, eg, git rm --cached again, or git reset (which also takes pathspecs, so you can git reset -- '*.jar' ).

Whether these frozen files are sufficient for your current situation is, of course, situation-dependent.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM