如何读取大文件（socket编程和python）？

Question

I'm a beginner in socket programming and python. 我是socket编程和python的初学者。 I would like to learn how to send a large text file (eg, > 5MB) from the server to client. 我想学习如何从服务器向客户端发送大文本文件（例如> 5MB）。 I keep getting an error that says 我不断收到错误消息

Traceback (most recent call last):
  File "fserver.py", line 50, in <module>
    reply = f.read()
ValueError: Mixing iteration and read methods would lose data

Below is a partial of my code. 以下是我的部分代码。 Can someone take a look and give me some hints on how to resolve this issue? 有人可以看看并给我一些关于如何解决这个问题的提示吗？ Thank you for your time. 感谢您的时间。

myserver.py myserver.py

#validate filename
        if os.path.exists(filename):
            with open(filename) as f:
                for line in f:
                    reply = f.read()
                    client.send(reply)
            #f = open(filename, 'r')
            #reply = f.read()
            #client.send(piece)
        else:
            reply = 'File not found'
            client.send(reply)

myclient.py myclient.py

while True:
    print 'Enter a command: list or get <filename>'
    command = raw_input()
    if command.strip() == 'quit':
        break
    client_socket.send(command)

    data = client_socket.recv(socksize)
    print data

Answer 1

The problem here has nothing to do with sockets, or with how big the file is. 这里的问题与套接字或文件有多大无关。 When you do this: 当你这样做：

for line in f:
    reply = f.read()

The for line in f is trying to read one line of the file at a time, and then for each line you're trying to read the entire file. for line in f的for line in f试图一次读取文件的一行，然后为每一行尝试读取整个文件。 That won't work. 那不行。

If you didn't get this error (which you won't in many cases), the first time through the loop you would read and ignore the first line, and then read and send everything but the first line (or, possibly, everything but the first, say, 4KB) as one giant reply, and then the loop would be done. 如果你没有得到这个错误（在许多情况下你不会这样），第一次通过循环你将读取并忽略第一行，然后读取并发送除第一行之外的所有内容（或者，可能，一切但第一个，比方说，4KB）作为一个巨大的回复，然后循环将完成。

What you want is either one or the other: 你想要的是一个或另一个：

for line in f:
    reply = line

… or … … 要么 …

# no for loop
reply = f.read()

Meanwhile, on your client side, you're only doing one recv . 同时，在您的客户端，您只做一次recv 。 That's going to get the first 4K (or whatever socksize is) or less, and then you never receive anything else. 这将是第一个4K（或任何socksize ）或更少，然后你永远不会收到任何其他东西。

What you need is a loop. 你需要的是一个循环。 Like this: 像这样：

while True:
    data = client_socket.recv(socksize)
    print data

But now you have a new problem. 但现在你有了一个新问题。 Once the file is done, the client will sit there waiting forever for the next chunk of data, which will never come. 一旦文件完成，客户端将坐在那里永远等待下一个数据块，这将永远不会到来。 So the client needs to know when it's done. 所以客户需要知道什么时候完成。 And the only way it can know that is if the server puts that information into the data stream. 它可以知道的唯一方法是服务器将该信息放入数据流中。

One way to do this is to send the length before the file. 一种方法是在文件之前发送长度。 One standardized way to do this is to use the netstring protocol. 一种标准化的方法是使用netstring协议。 You can find libraries that do this for you, but it's simple enough to do by hand. 您可以找到为您执行此操作的库，但这很简单，可以手动执行。 Or maybe do something more like HTTP, where the headers are just separated by newlines, and separated from the body by a blank line; 或者可能做一些更像HTTP的事情，其中标题只是换行符，并通过空行与正文分隔; then you can use socket.makefile as your protocol implementation. 那么您可以使用socket.makefile作为协议实现。 Or even a binary protocol, where you just send the length as four bytes. 甚至是二进制协议，您只需将长度发送为四个字节。

There's another problem we might as well fix while we're here: send(reply) doesn't necessarily send the whole reply; 当我们在这里时，我们可能还有另一个问题： send(reply)不一定发送整个回复; it sends anywhere from 1 byte to the whole thing, and returns a number telling you what got sent. 它从1个字节发送到整个事件的任何地方，并返回一个数字，告诉你发送了什么。 The simple fix to that is to use sendall(reply) , which guarantees to send all of it. 对此的简单修复是使用sendall(reply) ，它保证发送所有它。

And finally: Your server is expecting that each recv will get exactly one command, as sent by send . 最后：您的服务器期望每个recv将获得一个命令，如send 。 But sockets don't work that way. 但套接字不能那样工作。 Sockets are byte streams, not message streams ; 套接字是字节流，而不是消息流 ; there's nothing preventing recv from getting, say, just half a command, and then your server will break. 没有什么可以阻止recv获得，比如说只有半个命令，然后你的服务器就会崩溃。 So, you need some kind of protocol in that direction as well. 所以，你也需要这种方向的某种协议。 Again, you could use netstring, or newline-separated messages, or a binary length prefix, but you have to do something . 同样，您可以使用netstring，换行符分隔的消息或二进制长度前缀，但您必须执行某些操作 。

(The blog post linked above has very simple example code for using binary length prefixes as a protocol.) （上面链接的博客文章有非常简单的示例代码，用于使用二进制长度前缀作为协议。）

Answer 2

你可以for line in file.readlines()做

如何读取大文件（socket编程和python）？

问题描述

2 个解决方案

解决方案1
4 2013-11-15 00:06:03

解决方案2
0 2013-11-15 00:37:09

如何读取大文件（socket编程和python）？

问题描述

2 个解决方案

解决方案1 4 2013-11-15 00:06:03

解决方案2 0 2013-11-15 00:37:09

解决方案1
4 2013-11-15 00:06:03

解决方案2
0 2013-11-15 00:37:09