简体   繁体   English

Python - 如何正确获取文件中两个偏移量之间的内容?

[英]Python - How to correctly get content between two offset in a file?

I'm trying to get content between two offset (essentially part of a file).我试图在两个偏移量(基本上是文件的一部分)之间获取内容。 For that, I found fileslice to be useful.为此,我发现文件切片很有用。

For testing I'm using a file called hello with the string:为了测试,我使用了一个名为hello的文件和字符串:

helloworld

I left a newline deliberately since I'm doing testing on different things.因为我正在对不同的东西进行测试,所以我故意留下了一个换行符。

Using this code:使用此代码:

from fileslice import Slicer
import sys

r = open('hello', 'r')

slicer = Slicer(r)

start = int(sys.argv[1])
size = int(sys.argv[2])
fileslice = slicer (start, size)
    
sys.stdout.write(fileslice.read())

Anyway, the problem I'm facing is that, when using certain offset range, it seems like the wrong character represented by the offset get displayed...:无论如何,我面临的问题是,当使用某些偏移量范围时,似乎显示偏移量表示的错误字符......:

:~/fileslice$ wc -c hello # using wc to check the size
11 hello
:~/fileslice$ python -u "/home/user/fileslice/testslice.py" 0 11 | xxd # works
00000000: 6865 6c6c 6f77 6f72 6c64 0a              helloworld.
:~/fileslice$ python -u "/home/user/fileslice/testslice.py" 0 10 | xxd # works
00000000: 6865 6c6c 6f77 6f72 6c64                 helloworld
:~/fileslice$ python -u "/home/user/fileslice/testslice.py" 1 10 | xxd # doesn't work as expected
00000000: 656c 6c6f 776f 726c 640a                 elloworld.

Here I'm using the previously mentioned test file and code and pipe the output to wc (to check the size) then after that, do a couple testing and checking the output in Hex with xxd .在这里,我使用前面提到的测试文件和代码,并将输出通过管道传输到wc (以检查大小),然后,使用xxd进行一些测试并检查十六进制的输出。

As it can be seen, the one commented "works" work as expected, as in, i can get the content between the two offset just fine.可以看出,评论“有效”的人按预期工作,因为我可以很好地获得两个偏移量之间的内容。

But for the last one, where i wanted to get content between the char e (in this case offset 1 ) which "work" but then, notice that the previously discarded newline (offset 10 ) appear again, contrary to the previous test which worked fine/as excepted...但在过去的一个,在这里我想获得的焦炭之间的内容e (在这种情况下偏移1 ),它的“工作”,但随后,通知称,以前丢弃的换行符(偏移10 )再次出现,与以前的测试,工作很好/例外...

How can i correctly get content of a file using two offset?如何使用两个偏移量正确获取文件的内容? (start/end) (开始/结束)

大小是两个偏移量之间的距离,即结束减去开始。

size = int(sys.argv[2]) - int(sys.argv[1])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM