在基于git的CMS中，如何唯一标识git存储库中的文件？

Question

I'm working on a simple CMS (based on Django, not that it matters) similar to Jekyll and Hyde , but dynamic instead of static. 我正在开发类似于Jekyll和Hyde的简单CMS（基于Django，并不重要），但是是动态的而不是静态的。 The idea is that the server has a copy of the repository, I can push stuff into that, and the CMS will automatically pick up the new content. 想法是服务器具有存储库的副本，我可以将其放入其中，然后CMS将自动获取新内容。

Let's say that Markdown-formatted blog posts in my repository follow this file naming scheme: 假设我的存储库中Markdown格式的博客文章遵循以下文件命名方案：

/blog/2010/08/14/my-blog-post.md

Internally, the processed files will be cached in a SQLite database under a unique ID, for easy searching and fast serving. 在内部，处理后的文件将以唯一ID缓存在SQLite数据库中，以便于搜索和快速投放。

The problem is in constructing the URLs in such a way that they can be mapped to files in the repository. 问题在于以这样一种方式构造URL，即可以将它们映射到存储库中的文件。 I see several options: 我看到几个选择：

/blog/2010/08/14/my-blog-post
If I simply map (part of) the URL to a filename, renaming a file will break all links pointing to that file. 如果我只是将URL（的一部分）映射到文件名，则重命名文件将断开指向该文件的所有链接。 The content admin can leave a symlink in place of the old file, which the CMS can map into a HTTP redirect, but this requires work that is easy to forget. 内容管理员可以将符号链接保留在旧文件的位置，CMS可以将其映射到HTTP重定向，但这需要易于忘记的工作。
/blog/2010/08/14/271-my-blog-post
If I include a database ID in every URL, clearing or rebuilding the cache will invalidate all IDs, which is even worse. 如果我在每个URL中都包含一个数据库ID，那么清除或重建缓存将使所有ID失效，甚至更糟。 I would like the git repository to be the only thing representing the site's contents; 我希望git存储库是唯一代表网站内容的内容； everything else should be reconstructible from that. 其他一切都应该可以从中重建。
/blog/2010/08/14/528dc05-my-blog-post
The only thing uniquely identifying a file in the repo over time, as far as I can tell, is a pair (filename, SHA1). 据我所知，随着时间的推移，唯一可以唯一识别仓库中文件的是一对（文件名，SHA1）。 That file is guaranteed to exist in that commit, and we can trace it to the current HEAD through the git log. 该文件肯定存在于该提交中，我们可以通过git日志将其跟踪到当前HEAD。
(I won't include the full SHA1, but just enough to make collisions sufficiently unlikely. Will do the math later.) （我不会包括完整的SHA1，但足以使碰撞不太可能发生。稍后将进行数学计算。）

My question is twofold: 我的问题是双重的：

Is there an easy and fast way in git to track a (filename, SHA1) pair through renames to the corresponding filename in the current HEAD? git中是否有一种简便快捷的方法来通过重命名到当前HEAD中的相应文件名来跟踪（文件名，SHA1）对？
Is there a better way to accomplish my goals: not breaking existing URLs, but still allowing for renames and cache rebuilds? 有没有更好的方法可以实现我的目标：不破坏现有的URL，但仍然允许重命名和缓存重建？

Answer 1

Easy/fast? 容易/快速？ Not sure, but I don't think so. 不确定，但我不这么认为。 Git tracks the contents of files as blobs. Git以blob形式跟踪文件的内容。 The filenames of those blobs are then stored in tree objects. 然后将这些Blob的文件名存储在树对象中。 Then, commits point to tree objects, and add some metadata like committer, datetime, and a parent commit. 然后，提交指向树对象，并添加一些元数据，例如committer，datetime和父提交。

I don't think Git actually stores renames as such, it's merely a difference between trees pointing to the same blobs. 我不认为Git实际上会这样存储重命名，这只是指向相同blob的树之间的区别。

I think the best you can do is to have /path/to/file as URLs, and when you don't find that file in HEAD, iteratively scan backwards in the history to find the commit where there was one. 我认为您最好的办法是将/ path / to / file作为URL，并且当您在HEAD中找不到该文件时，请在历史记录中反复进行向后扫描，以查找存在该提交的位置。

If you're going to be doing this kind of repository level stuff, I recommend you pick up a copy of Peepcode's Git Internals, which quite clearly explains the inner workings of a git repository. 如果您打算做这种存储库级别的工作，我建议您选择一份Peepcode的Git Internals副本，该副本很清楚地解释了git存储库的内部工作原理。

在基于git的CMS中，如何唯一标识git存储库中的文件？

问题描述

1 个解决方案

解决方案1
0 已采纳 2010-08-14 12:01:52

在基于git的CMS中，如何唯一标识git存储库中的文件？

问题描述

1 个解决方案

解决方案1 0 已采纳 2010-08-14 12:01:52

解决方案1
0 已采纳 2010-08-14 12:01:52