简体   繁体   中英

How do I convert simple non source controlled project backups into a versioned git repository?

I have been extremely naughty. I have been developing a piece of software (I'm the only developer) for a little while (OK, it's over the course of a few years), but have not used any sort of source control.

I have resolved to use source control (git seems most likely, as the windows tools seem to have come on a lot on the last few months) from now on. What I do have is dated backups of the entire directory of my (.NET) solution.

What I would like to do is automagically have my backups visible in the revision history. It will be messy. Projects and files will have been added/removed over the course of the solution history. I'm not bothered about such problems as what I know to be renamed files being interpreted as removal of a file and addition of a new, unrelated one.

More generally my problem is: I have time ordered copies of a changing directory. Importing the first into git is easy I assume. But, I then want all subsequent copies of the directory to be merged, in date order, one at a time without me having to commit every sub-directory and file individually.

Is this possible, or is it just that I am punished for not using source control from the off?

Edit: If I go ahead with the 'commit all snapshots individually' method manually (there are less than 20 snapshots), is there a way (as Esko Luontola suggests I might want to) of overriding the commit dates with the dates I have for the snapshot. git commit does not appear to have a flag to allow this. Is there another way (I'm using Vista)?

Edit: In answer to my issue of using the original dates: You have to set the GIT_AUTHOR_DATE and/or GIT_COMMITER_DATE environment variables to override the use of current dates and times when performing the commit.

The reason there are two sets of variables (there are also GIT_(AUTHOR|COMMITER)_(NAME|DATE|EMAIL)) is to distinguish between, say, the author who emails a patch, and the maintainer who is actually doing the commits into the repo.

Note if using git extensions on VS: If you set (export varname="value") these variables using the 'git bash' command line, and then switch back to the GUI to do a commit, it seems to ignore them. You have to stay on the command line and run 'git commit' from there.

You can use example git-fast-import based tools distributed in git.git repository: import-zips.py (in Python) or import-tars.perl (in Perl) , or use those script as a base of your own import script. You can find those scripts in contrib/fast-import/ directory.

There might be an already automated way of doing this, but git should be smart enough to let you git init in your oldest backup and then repeatedly copy the .git folder to incrementally newer backups and create a commit with everything for each one. Scripting this should be pretty easy.

Even better, you could pass a --git-dir= option or $GIT_DIR environment variable to git and have it use a repository, saving the copying step.

Something like this:

cd $FINAL_DIR
git init

export GIT_DIR=$FINAL_DIR/.git

cd $NEXT_BACKUP
git add -A .
git commit
# rinse and repeat

I don't quite know why you don't want to just commit all snapshots individually. I mean, a shell script (or Perl, Python, Ruby, Tcl, whatever) to do that, is probably less than 5 lines of code and less than 10 minutes of work.

Also, there isgit load-dirs , which would allow you to cut that down to maybe 3 lines and 5 minutes. But you still have to load every dir indvidually.

But, if you are so inclined, there is the git fast-import tool which is intended to make writing repository converters and importers easier. According to the manpage, you could write an importer in about 100 lines and a couple of hours.

However, all this ignores the biggest problem: the value of a VCS lies not in the contents – you could just as well use regular backups for that – but in the commit messages. And no magic tool is going to help you there, you'll have to type them all in yourself … and more importantly, you'll have to remember exactly why you made every single little change over the last years.

Also check out the "A Custom Importer" section in the Migrating to Git chapter . which talks about this exact issue.

If you are using import-tars importer (in contrib/fast-import/ ), know that it used to create phony files at the top-level of the repository when the archive contains global PAX headers (to register commits), which made its own logic to detect and omit the common leading directory ineffective, which has been corrected with Git 2.27 (Q2 2020).

See commit c839fcf (24 Mar 2020) by Johannes Schindelin ( dscho ) .
(Merged by Junio C Hamano -- gitster -- in commit 8633f21 , 22 Apr 2020)

import-tars : ignore the global PAX header

Signed-off-by: Johannes Schindelin

The tar importer in contrib/fast-import/import-tars.perl has a very convenient feature: if all paths stored in the imported .tar start with a common prefix, eg git-2.26.0/ in the tar at https://github.com/git/git/archive/v2.26.0.tar.gz , then this prefix is stripped.

This feature makes a ton of sense because it is relatively common to import two or more revisions of the same project into Git, and obviously we don't want all files to live in a tree whose name changes from revision to revision.

Now, the problem with that feature is that it breaks down if there is a pax_global_header "file" located outside of said prefix, at the top of the tree.

This is the case for .tar files generated by Git's very own git archive command: it inserts that header, and git archive allows specifying a common prefix (that the header does _not _ share with the other files contained in the archive) via --prefix=my-project-1.0.0/ .

Let's just skip any global header when importing .tar files into Git.

Note: this global header might contain useful information.
For example, in the output of git archive , it lists the original commit, which _is _ useful information.

A future improvement to the import-tars.perl script might be to include that information in the commit message, or do other things with the information (eg use mtime information contained in the global header as date of the commit).
This patch does not prevent any future patch from making that happen, it only prevents the header from being treated as if it was a regular file.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM