简体   繁体   English

Git 智能 HTTP(S) 协议在它所有的荣耀中完全是什么样子的?

[英]What does the Git Smart HTTP(S) protocol fully look like in all its glory?

I'm trying to implement a webserver that simulates a Git remote.我正在尝试实现一个模拟 Git 遥控器的网络服务器。 Users should be able to clone or pull from my server, edit files, commit, and push (with authentication)—normal things to do with Git. However, on the server side is not a bare Git repository or anything;用户应该能够从我的服务器克隆或拉取、编辑文件、提交和推送(使用身份验证)——与 Git 相关的正常操作。但是,在服务器端不是一个裸露的 Git 存储库或任何东西; data is stored in other formats, and only converted when requested.数据以其他格式存储,并且仅在请求时才进行转换。

I've spent a lot of time trying to find out how the Git Smart HTTP protocol works, and here's what I know so far.我花了很多时间试图找出 Git Smart HTTP 协议的工作原理,以下是我目前所知道的。

From the Git docs on http-protocol , I know that GET $GIT_URL/info/refs?service=git-upload-pack HTTP/1.1 should elicit the following (example) response:http-protocol 上的 Git 文档,我知道GET $GIT_URL/info/refs?service=git-upload-pack HTTP/1.1应该引发以下(示例)响应:

HTTP/1.1 200 OK<CRLF>
Content-Type: application/x-git-upload-pack-advertisement<CRLF>
Cache-Control: no-cache<CRLF>
<CRLF>
001e# service=git-upload-pack<LF>
0000<no LF>
004895dcfa3633004da0049d3d0fa03f80589cbcaf31 refs/heads/maint<NUL>multi_ack<LF>
003fd049f6c27a2244e12041955e262a404c7faba355 refs/heads/master<LF>
003c2cb58b79488a98d2721cea644875a8dd0026b115 refs/tags/v1.0<LF>
003fa3c2e2402b99163d1d59756e5f207ae21cccba4c refs/tags/v1.0^{}<LF>
0000

From my own experimentation with a repo of mine with very few commits , it seems GitHub is so far entirely within the limits of the protocol as described in the docs:从我自己对很少提交的回购进行的实验来看,到目前为止,GitHub 似乎完全在文档中描述的协议限制范围内:

HTTP/1.1 200 OK<CRLF>
Server: GitHub Babel 2.0<CRLF>
Content-Type: application/x-git-upload-pack-advertisement<CRLF>
Content-Security-Policy: default-src 'none'; sandbox<CRLF>
Transfer-Encoding: chunked<CRLF>
expires: Fri, 01 Jan 1980 00:00:00 GMT<CRLF>
pragma: no-cache<CRLF>
Cache-Control: no-cache, max-age=0, must-revalidate<CRLF>
Vary: Accept-Encoding<CRLF>
X-Frame-Options: DENY<CRLF>
X-GitHub-Request-Id: [redacted]<CRLF>
<CRLF>
001e# service=git-upload-pack<LF>
0000<no LF>0156feee8d0aeff172f5b39e3175175d027f3fd5ecc1 HEAD<NUL>multi_ack thin-pack side-band side-band-64k ofs-delta shallow deepen-since deepen-not deepen-relative no-progress include-tag multi_ack_detailed allow-tip-sha1-in-want allow-reachable-sha1-in-want no-done symref=HEAD:refs/heads/master filter object-format=sha1 agent=git/github-g69d6dd5d35d8<LF>
003ffeee8d0aeff172f5b39e3175175d027f3fd5ecc1 refs/heads/master<LF>
0000

However this is where the easy part ends.然而,这是简单的部分结束的地方。 What if I want to actually get that commit data?如果我想实际获取提交数据怎么办? The Git docs on the matter gives an example of the POST request to send, and some grammar, and then says " TODO: Document this further ". 有关此事的 Git 文档给出了要发送的 POST 请求示例和一些语法,然后说“ TODO: Document this further ”。 ???????? ????????

I tried experimenting by CURLing GitHub in the format I see in the docs.我尝试以我在文档中看到的格式通过 CURLing GitHub 进行试验。

(cwd)>curl https://github.com/Kenny2github/ConvoSplit.git/git-upload-pack -o - -i -X POST -d @-
0032want feee8d0aeff172f5b39e3175175d027f3fd5ecc1
0032have 941ea62275547bcbfb78fd97d29be18d09a78190
0009done
0000
^Z
HTTP/1.1 200 OK
Server: GitHub Babel 2.0
Content-Type: application/x-git-upload-pack-result
Content-Security-Policy: default-src 'none'; sandbox
Transfer-Encoding: chunked
expires: Fri, 01 Jan 1980 00:00:00 GMT
pragma: no-cache
Cache-Control: no-cache, max-age=0, must-revalidate
Vary: Accept-Encoding
X-GitHub-Request-Id: [redacted]
X-Frame-Options: DENY

curl: (18) transfer closed with outstanding read data remaining

What?什么?

I tried using Python:我尝试使用 Python:

>>> import requests
>>> requests.post('https://github.com/Kenny2github/ConvoSplit.git/git-upload-pack', data=b'''
0032want feee8d0aeff172f5b39e3175175d027f3fd5ecc1
0032have 941ea62275547bcbfb78fd97d29be18d09a78190
0009done
0000
'''.strip())
Traceback (most recent call last):
  File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\response.py", line 572, in _update_chunk_length
    self.chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: b''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\response.py", line 331, in _error_catcher
    yield
  File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\response.py", line 637, in read_chunked
    self._update_chunk_length()
  File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\response.py", line 576, in _update_chunk_length
    raise httplib.IncompleteRead(line)
http.client.IncompleteRead: IncompleteRead(0 bytes read)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\models.py", line 751, in generate
    for chunk in self.raw.stream(chunk_size, decode_content=True):
  File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\response.py", line 461, in stream
    for line in self.read_chunked(amt, decode_content=decode_content):
  File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\response.py", line 665, in read_chunked
    self._original_response.close()
  File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\contextlib.py", line 130, in __exit__
    self.gen.throw(type, value, traceback)
  File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\response.py", line 349, in _error_catcher
    raise ProtocolError('Connection broken: %r' % e, e)
urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<pyshell#17>", line 1, in <module>
    requests.post('https://github.com/Kenny2github/ConvoSplit.git/git-upload-pack', data=b'0032want feee8d0aeff172f5b39e3175175d027f3fd5ecc1\n0032have 941ea62275547bcbfb78fd97d29be18d09a78190\n0009done\n0000')
  File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\api.py", line 119, in post
    return request('post', url, data=data, json=json, **kwargs)
  File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\sessions.py", line 530, in request
    resp = self.send(prep, **send_kwargs)
  File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\sessions.py", line 685, in send
    r.content
  File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\models.py", line 829, in content
    self._content = b''.join(self.iter_content(CONTENT_CHUNK_SIZE)) or b''
  File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\models.py", line 754, in generate
    raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))

The rest of the http-protocol docs don't help - another six TODOs appear. http 协议文档的 rest 没有帮助 - 出现了另外六个 TODO。 The pack-protocol docs at least give me an idea of what I'm supposed to be receiving, but no indication of how. pack-protocol 文档至少让我知道我应该接收什么,但没有说明如何接收。

TheTransfer Protocols docs tells me nothing new, and then says "take a look at the Git source code". 传输协议文档没有告诉我任何新内容,然后说“看看 Git 源代码”。 I tried that, but it's hardcore C and I'd have to understand basically the entire infrastructure of Git itself.我试过了,但它是核心 C,我必须基本上了解 Git 本身的整个基础设施。 (I may yet attempt to do that, but now is not the time.) (我可能会尝试这样做,但现在不是时候。)

I did manage to glean that git upload-pack is involved, and running git upload-pack --stateless-rpc --advertise-refs.git did give me the /info/refs list like before.我确实设法收集到涉及git upload-pack ,并且运行git upload-pack --stateless-rpc --advertise-refs.git确实像以前一样给了我 /info/refs 列表。 However, attempts to get an actual pack out of it failed, and not only did they fail, they failed inconsistently between platforms.然而,从中取出实际包的尝试失败了,不仅失败了,而且在平台之间的失败也不一致。

On Windows:在 Windows 上:

(cwd)>git upload-pack --stateless-rpc .git
0032want feee8d0aeff172f5b39e3175175d027f3fd5ecc1
0009done # I hit Enter and nothing else
fatal: protocol error: bad line length character:
000

(cwd)>git upload-pack --stateless-rpc .git
0032want feee8d0aeff172f5b39e3175175d027f3fd5ecc1
0000 # likewise
fatal: protocol error: bad line length character:
000

(cwd)>py -c "print('0032want feee8d0aeff172f5b39e3175175d027f3fd5ecc1\n0009done\n0000')" | git upload-pack --stateless-rpc .git
fatal: protocol error: bad line length character:
000

Suspecting it was carriage returns causing problems, I tried WSL:怀疑是回车引起的问题,我尝试了WSL:

$ git upload-pack --stateless-rpc .git
0032want feee8d0aeff172f5b39e3175175d027f3fd5ecc1
0000 # I hit Enter and then ^D after 0000
fatal: The remote end hung up unexpectedly

$ git upload-pack --stateless-rpc .git
0032want feee8d0aeff172f5b39e3175175d027f3fd5ecc1
0009done # I hit Enter and did NOT hit ^D
fatal: git upload-pack: protocol error, expected to get sha, not 'done'

$ # using Python to pipe each of the above inputs yielded the same results

What am I doing wrong?我究竟做错了什么? How can I get GitHub/git-upload-pack to respect me?如何让 GitHub/git-upload-pack 尊重我?

First of all, it isn't possible to explain the entire protocol in a StackOverflow answer;首先,不可能在 StackOverflow 的回答中解释整个协议; the explanation is too long.解释太长了。 However, I'll try to point out a few things to note.但是,我会尝试指出一些需要注意的事项。

First, when you speak the protocol, you need to be pretty exact;首先,当你说协议时,你需要非常准确; this is not a case where line ending differences and extra bytes will be tolerated.这不是容忍行尾差异和额外字节的情况。 As such, if you're synthesizing data to pass to the remote, it should be done with printf(1) or a programming language.因此,如果您要合成数据以传递到远程,则应使用printf(1)或编程语言来完成。 Don't type things at the shell.不要在 shell 上打字。

Git uses the pkt-line format, which means that every line or chunk of data is prefixed with a four hex-character sequence that represents the length of the data and the prefix. Git 使用 pkt-line 格式,这意味着每一行或每一块数据都以一个代表数据长度和前缀的四个十六进制字符序列为前缀。 If the sequence is 0000, that's a flush packet and it indicates the end of that chunk of data.如果序列为 0000,则这是一个刷新数据包,它表示该数据块的结尾。 If the sequence is 0001, that's a delimiter packet and it's used in protocol v2 to delimit sections of that chunk of data.如果序列为 0001,则这是一个定界符数据包,它在协议 v2 中用于定界该数据块的各个部分。 Otherwise, the hex sequence cannot have a value exceeding 65519.否则,十六进制序列的值不能超过 65519。

In your situation where you're sending want and have lines, you're expected to do multiple iterations until the server sends you a pack.在您发送want和线路的情况下,您have进行多次迭代,直到服务器向您发送一个包。 In HTTP, that's multiple requests.在 HTTP 中,这是多个请求。 The server will send you acknowledgements for the have arguments you've specified.服务器将向您发送您指定have arguments 的确认。 The server expects to find a path from each want directive to an object both sides have (or else, that the client has nothing, in which case the repository is empty).服务器期望找到从每个want指令到双方都有的 object 的路径(否则,客户端什么都没有,在这种情况下存储库为空)。

Be aware that this task is actually quite involved.请注意,此任务实际上非常复杂。 There's now a v2 of the protocol (the old one was v0, and there's a v1, which is the same but with a version header) for fetches.现在有一个用于提取的协议的 v2(旧的是 v0,还有一个 v1,它是相同的但带有版本标头)。 You should also expect to be able to support SHA-256 repositories, which don't currently interoperate with SHA-1 repositories, but are otherwise supported.您还应该期望能够支持 SHA-256 存储库,这些存储库当前不与 SHA-1 存储库互操作,但在其他方面受到支持。 And Git also provides a large number of extensions which you will practically want to support, like the sideband functionality, which is required if you want to provide output to the user about what your side is doing.并且 Git 还提供了大量您实际上想要支持的扩展,例如边带功能,如果您想向 output 用户提供有关您一方正在做什么的信息,则这是必需的。

The documentation mostly lives in Documentation/technical in the Git repository.该文档主要位于 Git 存储库中的Documentation/technical中。 It is incomplete in some places, but you should mostly be able to discern it with some reading and testing.它在某些地方是不完整的,但你应该能够通过一些阅读和测试来辨别它。

Okay, after some more experimentation I happened upon the right combination, if you will, by random chance.好吧,经过更多的实验,我偶然发现了正确的组合,如果你愿意的话。

$ git upload-pack --stateless-rpc .git > tmp.pack
0032want feee8d0aeff172f5b39e3175175d027f3fd5ecc1
00000009done # Enter with NO ^D
Counting objects: 16, done.
Compressing objects: 100% (14/14), done.
Total 16 (delta 3), reused 0 (delta 0)
$ hd tmp.bin
00000000  30 30 30 38 4e 41 4b 0a  50 41 43 4b 00 00 00 02  |0008NAK.PACK....|
00000010  00 00 00 10 94 2f 78 9c  a5 92 4f 6f db 30 0c c5  |...../x...Oo.0..|
...
>>> import requests
>>> # omitting the trailing \n results in a 200 OK blank response
>>> r = requests.post('https://github.com/Kenny2github/ConvoSplit.git/git-upload-pack', data=b'0032want feee8d0aeff172f5b39e3175175d027f3fd5ecc1\n00000009done\n')
>>> r.text[:20]
'0008NAK\nPACK\x00\x00\x00\x02\x00\x00\x00\x10'

However, this only offers me control over which commits I want.但是,这只能让我控制我想要的提交。 If I try to specify which commits I have ( like I should be able to ), I only get ACKs for my haves:如果我尝试指定我有哪些提交(就像我应该能够的那样),我只会得到我拥有的 ACK:

>>> print(requests.post('https://github.com/Kenny2github/ConvoSplit.git/git-upload-pack', data=b'''
0032want feee8d0aeff172f5b39e3175175d027f3fd5ecc1
00000032have 941ea62275547bcbfb78fd97d29be18d09a78190
0032have 93dbc9cfb21d23c6eb5313419bfaa8213619c73c
0032have 648508d6359b3e8992ee5a6d9fee6f86110202fd
00000009done
'''.lstrip()).text)
0031ACK 941ea62275547bcbfb78fd97d29be18d09a78190
0031ACK 93dbc9cfb21d23c6eb5313419bfaa8213619c73c
0031ACK 648508d6359b3e8992ee5a6d9fee6f86110202fd

(Same deal if I try with git upload-pack .) How do I properly handle the rest of the whole process? (如果我尝试使用git upload-pack一样。)如何正确处理整个过程的 rest ? Once more, I'm aiming to simulate a(n essentially) complete git remote.再一次,我的目标是模拟一个(本质上)完整的 git 遥控器。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM