简体   繁体   English

如何使用 GitPython 在提交中获取文件的源代码?

[英]How get source code of files in a commit with GitPython?

I need to get source codes for all files in a commit.我需要获取提交中所有文件的源代码。 Currently I am using Pydriller and it works well.目前我正在使用 Pydriller,它运行良好。 But for performance reasons I need to use GitPython.但出于性能原因,我需要使用 GitPython。 I have tried this solution:我试过这个解决方案:

repo = Repo('path to repo') )
    commit = repo.commit('my hash')
with io.BytesIO(target_file.data_stream.read()) as f: 
    print(f.read().decode('utf-8'))

But I get this error:但我收到此错误:

Traceback (most recent call last):
File "D:\Programmi\Python36\lib\threading.py", line 916, in _bootstrap_inner
    self.run()
File "D:\Programmi\Python36\lib\threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
File "D:/Workspaces/PythonProjects/fixing- 
    commit/crop_data_preparing_gitpython.py", line 82, in 
get_commit_data_gitpython
print(f.read().decode('utf-8'))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9f in position 18: invalid start byte

I have thought that this can be an encoding problem, but even changing the encoding from utf-8 to latin-1 doesn't help.我认为这可能是编码问题,但即使将编码从 utf-8 更改为 latin-1 也无济于事。

Does exist another strategy that would help me get the code for those files using GitPython?是否存在另一种策略可以帮助我使用 GitPython 获取这些文件的代码?

As the first comment suggested, for these things I'd suggest you to use PyDriller , it's much easier:正如第一条评论所建议的那样,对于这些事情,我建议您使用PyDriller ,这要容易得多:

for commit in RepositoryMining("repo").traverse_commits():
    for modified_file in commit.modifications:
        modified_file.source_code

It also takes care of decoding, renames, etc. You also have the source_code before the commit ( modified_file.source_code_before )它还负责解码、重命名等。您还有提交前的 source_code ( modified_file.source_code_before )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM