简体   繁体   English

为什么脚本的行为不同于命令行属性和git属性?

[英]Why do scripts behave differently called from commandline vs git attribuites?

Updated scripts attached below, these are now working on my sample document 下面附有更新的脚本,这些脚本现在可以在我的示例文档中使用

Why do the following python scripts perform differently when called via git attributes or from command line? 当通过git属性或从命令行调用以下python脚本时,为什么表现不同?

What I have are two scripts that I modeled based on the mercurial zipdoc functionality. 我有两个脚本是基于Mercurial zipdoc功能建模的。 All I'm attempting to do is unzip docx files on store (filter.clean) and zip them on restore (filter.smudge). 我要做的只是在存储(filter.clean)上解压缩docx文件,然后在还原(filter.smudge)上压缩它们。 I have two scripts working well, but once I put them into git attribute they don't work and I don't understand why. 我有两个脚本运行良好,但是一旦将它们放到git属性中,它们将无法运行,而且我也不明白为什么。

I've tested by doing the following 我已经通过以下操作进行了测试

Testing the Scripts (git bash) 测试脚本(git bash)

$ cat original.docx | $ cat original.docx | python ~/Documents/pyscripts/unzip.py > uncompress.docx python〜/ Documents / pyscripts / unzip.py> uncompress.docx

$ cat uncompress.docx | $ cat uncompress.docx | python ~/Documents/pyscripts/zip.py > compress.docx python〜/ Documents / pyscripts / zip.py> compress.docx

$ md5sum uncompress.docx compress.docx $ md5sum uncompress.docx compress.docx

I can open both the uncompressed and compressed files with Microsoft Word with no errors. 我可以使用Microsoft Word打开未压缩文件和压缩文件,没有任何错误。 The scripts work as expected. 脚本按预期工作。

Test Git Attributes 测试Git属性

  1. I set both clean and scrub to cat , verified my file saves and restores w/o problem. 我将clean和scrub都设置为cat ,验证了我的文件可以保存和恢复无问题。
  2. I set clean to python ~/Documents/pyscripts/unzip.py . 我将clean设置为python〜/ Documents / pyscripts / unzip.py After a commit and checkout the file is now larger (uncompressed) but errors when opened in MS-Word. 提交和签出后,文件现在更大(未压缩),但是在MS-Word中打开时出错。 Also the md5 does not match the "script only" test above. md5也与上面的“仅脚本”测试不匹配。 Although the file size is identical. 尽管文件大小相同。
  3. I set clean back to cat and set scrub to python ~/Documents/pyscripts/zip.py . 我将clean设置为cat并将scrub设置为python〜/ Documents / pyscripts / zip.py。 After a commit and checkout the file is now smaller (compressed) but again errors when opened in MS-Word. 提交和签出后,文件现在变小(压缩),但是在MS-Word中打开时再次出错。 Again the md5 differs from the "script only" test but the file size matches. md5再次不同于“仅脚本”测试,但文件大小匹配。
  4. Setting both clean and scrub to the python scripts produces an error, as expected. 如预期的那样,将clean和scrub设置为python脚本会产生错误。

I'm really lost here, I thought git Attributes simply provides input on stdin and reads it from stdout. 我在这里真的迷路了,我认为git Attributes只是在stdin上提供输入,并从stdout读取它。 I've tested both scripts to work with a pipe from cat and a redirect from the output just fine. 我已经测试了两个脚本,它们都可以与cat的管道和输出的重定向一起使用。 I know the scripts are running b/c the files change size as expected, however a small change is introduced somewhere in the file. 我知道脚本正在运行b / c,文件会按预期更改大小,但是文件中的某个地方引入了一个小的更改。

Additional References 其他参考

I'm using msgit on Win7, all commands above were typed into the git bash window. 我在Win7上使用msgit,上面的所有命令都输入到git bash窗口中。

Git Attributes Description Git属性说明

Uncompress Script 解压缩脚本

import fileinput
import sys
import zipfile

# Set stdin and stdout to binary read/write
if sys.platform == "win32":
    import os, msvcrt
    msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)
    msvcrt.setmode(sys.stdin.fileno(), os.O_BINARY)

try:
  from cStringIO import StringIO
except:
  from StringIO import StringIO

# Wrap stdio into a file like object
inString = StringIO(sys.stdin.read())
outString = StringIO()

# Store each member uncompressed
try:
    with zipfile.ZipFile(inString,'r') as inFile:
        outFile = zipfile.ZipFile(outString,'w',zipfile.ZIP_STORED)
        for memberInfo in inFile.infolist():
            member = inFile.read(memberInfo)
            memberInfo.compress_type = 0
            outFile.writestr(memberInfo,member)
        outFile.close()
except zipfile.BadZipfile:
    sys.stdout.write(inString.getvalue())

sys.stdout.write(outString.getvalue())

Compress Script 压缩脚本

import fileinput
import sys
import zipfile

# Set stdin and stdout to binary read/write
if sys.platform == "win32":
    import os, msvcrt
    msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)
    msvcrt.setmode(sys.stdin.fileno(), os.O_BINARY)

try:
  from cStringIO import StringIO
except:
  from StringIO import StringIO

# Wrap stdio into a file like object
inString = StringIO(sys.stdin.read())
outString = StringIO()

# Store each member compressed
try:
    with zipfile.ZipFile(inString,'r') as inFile:
        outFile = zipfile.ZipFile(outString,'w',zipfile.ZIP_DEFLATED)
        for memberInfo in inFile.infolist():
            member = inFile.read(memberInfo)
            memberInfo.compress_type = zipfile.ZIP_DEFLATED
            outFile.writestr(memberInfo,member)
        outFile.close()
except zipfile.BadZipfile:
    sys.stdout.write(inString.getvalue())

sys.stdout.write(outString.getvalue())

Edit: Formatting Edit 2: Scripts updated to write to memory rather than stdout during file processing. 编辑:格式化编辑2:脚本已更新为在文件处理过程中写入内存而不是标准输出。

I've found that using zipfile.ZipFile() with the target being stdout was causing a problem. 我发现在目标为stdout的情况下使用zipfile.ZipFile()会引起问题。 Opening the zipfile with the target being a StringIO() and then at the end writing the full StringIO file into stdout has solved that problem. 打开目标为StringIO()的zip文件,然后最后将完整的StringIO文件写入stdout已解决了该问题。

I haven't tested this extensively and it's possible some .docx contents won't be handled well but only time will tell. 我尚未对此进行广泛的测试,有可能某些.docx内容无法很好地处理,但只有时间才能证明。 My test files now open with out error, and as a bonus the .docx file in your working directory is smaller due to using higher compression than the standard .docx format. 我的测试文件现在打开时没有错误,并且,由于使用了比标准.docx格式更高的压缩率,因此,工作目录中的.docx文件更小了。

I have confirmed that after performing several edits and commits on a .docx file I can open the file, edit one line, and commit with out a large delta added to the repo size. 我已经确认,在对.docx文件进行几次编辑和提交之后,我可以打开该文件,编辑一行,然后在不增加回购文件大小的情况下提交。 For example, a 19KB file, after 3 previous edits in the repo history, having a new line added at the top created a delta of only 1KB in the repo after performing garbage collection . 例如,一个19KB的文件在回购历史记录中进行了3次先前的编辑之后,在顶部添加了新行,从而在执行垃圾回收后在回购中仅产生了1KB的增量。 Running the same test (as close as I could) with Mercurial resulted in a 9.3KB delta commit. 使用Mercurial运行相同的测试(尽我所能)导致9.3KB增量提交。 I'm no Mercurial expert my understanding is there is no "gc" command for mercurial so none was run. 我不是水银专家,我的理解是没有用于水银的“ gc”命令,因此没有运行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 为什么在Tornado RequestHandler中的__init__中调用的异步使用者的行为与静态调用的行为不同? - Why does an async consumer called in __init__ in a Tornado RequestHandler behave differently from statically called? 为什么这些生成器表达式的行为不同? - Why do these generator expressions behave differently? 为什么这些 wav 文件的行为不同? - Why do these wav-files behave differently? 为什么两个记录器的行为会不同? - Why do two loggers can behave differently? 为什么在 pandas read_csv 与 from_csv 上绘图 - 表现不同? - Why does plotting on pandas read_csv vs from_csv - behave differently? 为什么过滤QuerySet对于用户和超级用户的行为会有所不同? - Why does filtering a QuerySet behave differently for user vs. superuser? 为什么 `is` 运算符在脚本和 REPL 中的行为不同? - Why does the `is` operator behave differently in a script vs the REPL? 为什么 pip 在本地与在 docker 容器中的行为不同? - Why does pip behave differently locally vs in a docker container? 为什么字符串比较和标识在pdb和python控制台中表现不同 - Why do string comparison and identity behave differently in pdb and python console 为什么sum(DF)与DF.sum()的行为不同? - Why does sum(DF) behave differently from DF.sum()?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM