简体   繁体   中英

Send metadata before file with sockets in python

I'm struggling to achieve some "precise" data exchange using sockets. I've a program able to send/receive files using these sockets, and I've prepared it to send in a first stage the filename and the file size .

Everything seems to work fine, but sometimes the data that is received is not the expected one. My guess is that the data received is that of two "send operations" of the Tx that arrive together, so my intended "parse" of the string received is not correct and it crashes.

Currently my code looks like this for the receiver :

 while True:
    c, addr = self.s.accept()
    l = c.recv(1024)
    while (l):
        if stage < 2:
            self.__recvHeader(l)
            stage += 1
        else:
            self.f.write(l)
        l = c.recv(1024)

Being the __recvHeader function:

def __recvHeader(self, data):
    line = data.decode("utf-8").split(":")
    if line[0] == "Name":
        self.filename = line[1]
        self.f = open("/tmp/" + self.filename, 'wb')
    elif line[0] == "Size":
        self.size = int(line[1])
    else:
        print("ERROR: " + "".join(line))

And the Tx does like:

# Here I send some headers first, then
l = f.read(1024)
while (l):
    self.s.send(l)
    l = f.read(1024)

With the sendHeader function being:

def __sendHeader(self, name, value):
    self.s.send((name + ":" + value).encode('utf-8'))

The problem to me seems like I cannot set a fixed length for the headers, since file name and file size may change.

Any idea about how to treat this problem, or how could I turn these data in something of fixed size to avoid this problem? This last option would need of a different "parse" IMO as well, wouldn't it?

I assume you are using TCP/IP sockets. TCP/IP is a streaming protocol and does not know anything about your data structures. If you send a "message" in one send() operation, there is no guarantee that it will arrive in one recv() operation. Or that a recv() operation will receive only one "message". In your case, a header is a message.

So you have to delimit your messages in some way, so that the receiver can correctly receive and parse them. You have basically two options:

  1. First send the length (number of bytes) of the header followed by the header data. The receiver first reads the length and then reads that many bytes.
  2. Send a delimiter after each header. The receiver reads header data until the delimiter is received.

In the first option you have to think about how to send the length. If you use a multiple byte value, such as a 32-bit value, you might want to convert it to network-byte order before sending. See htonl .

In the second option you could recv() byte-by-byte but this will be very slow. You might want to use some kind of buffering.

Ensure that you are using a socket with: socket.SOCK_STREAM ! That means the sockets use TCP, which ensures that your data arrives and arrives in order (within reasonable limits of "ensuredness"). If problems persist, read on...

I would first base64 encode your file, in order to remove any quirkiness in the data. base64 uses a limited alphabet to encode the data. So adding a marker token that is outside of that alphabet is trivial and safe. You could literally do (pseudo-code):

while ('$'.encode('utf-8') not in l):
    l = c.recv(1024)
    # append l to a bytearray or similar

And you just send "$<base64filecontent>$" .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM