简体   繁体   English

没有 socket.makefile() 的 python socket readline

[英]python socket readline without socket.makefile()

I'm trying to parse an HTTP request line (eg ' GET / HTTP/1.1\\r\\n '), which is easy by using the socket.makefile().readline() function (BaseHTTPRequestHandler uses it), like:我正在尝试解析 HTTP 请求行(例如 ' GET / HTTP/1.1\\r\\n '),这很容易通过使用 socket.makefile().readline() 函数(BaseHTTPRequestHandler 使用它),例如:

print sock.makefile().readline()

unfortunately, as the documentation says, when using makefile() the socket must be in blocking mode (it can not have a timeout) ;不幸的是,正如文档所说,当使用 makefile() 时,套接字必须处于阻塞模式(它不能有超时) how can I implement a readline()-like function that does the same without using makefile() file object interface and not reading more than needed (as it'd discard data I will need after)?如何在不使用 makefile() 文件对象接口的情况下实现类似 readline() 的函数,并且不会读取超过需要的数据(因为它会丢弃我之后需要的数据)?

a pretty inefficient example:一个非常低效的例子:

request_line = ""
while not request_line.endswith('\n'):
    request_line += sock.recv(1)
print request_line 

Four and a half years later, I would suggest asyncio's Streams for this, but here's how you might do it properly using BytesIO四年半后,我会为此建议使用asyncio 的 Streams ,但您可以通过BytesIO正确使用BytesIO

Note that this implementation "shrinks" the in-memory BytesIO object each time a line is detected.请注意,每次检测到一行时,此实现都会“缩小”内存中的BytesIO对象。 If you didn't care about that, this could be a lot fewer lines.如果你不关心这一点,这可能会少很多行。

import socket
import time
from io import BytesIO

sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect(('localhost', 1234))
sock.setblocking(False)


def handle_line(line):
    # or, print("Line Received:", line.decode().rstrip())
    print(f"Line Received: {line.decode().rstrip()!r}")


with BytesIO() as buffer:
    while True:
        try:
            resp = sock.recv(100)       # Read in some number of bytes -- balance this
        except BlockingIOError:
            print("sleeping")           # Do whatever you want here, this just
            time.sleep(2)               #   illustrates that it's nonblocking
        else:
            buffer.write(resp)          # Write to the BytesIO object
            buffer.seek(0)              # Set the file pointer to the SoF
            start_index = 0             # Count the number of characters processed
            for line in buffer:
                start_index += len(line)
                handle_line(line)       # Do something with your line

            """ If we received any newline-terminated lines, this will be nonzero.
                In that case, we read the remaining bytes into memory, truncate
                the BytesIO object, reset the file pointer and re-write the
                remaining bytes back into it.  This will advance the file pointer
                appropriately.  If start_index is zero, the buffer doesn't contain
                any newline-terminated lines, so we set the file pointer to the
                end of the file to not overwrite bytes.
            """
            if start_index:
                buffer.seek(start_index)
                remaining = buffer.read()
                buffer.truncate(0)
                buffer.seek(0)
                buffer.write(remaining)
            else:
                buffer.seek(0, 2)

(The original answer was so bad that it wasn't worth keeping (I promise), but should be available in the edit history). (原始答案太糟糕了,不值得保留(我保证),但应该在编辑历史记录中可用)。

SocketStreamReader套接字流读取器

Here is a (buffered) line-reader that does not use asyncio .这是一个不使用asyncio的(缓冲的)行阅读器。 It can be used as a "synchronous" socket -based replacement for asyncio.StreamReader .它可以用作asyncio.StreamReader的“同步”基于socket的替代asyncio.StreamReader

import socket
from asyncio import IncompleteReadError  # only import the exception class


class SocketStreamReader:
    def __init__(self, sock: socket.socket):
        self._sock = sock
        self._recv_buffer = bytearray()

    def read(self, num_bytes: int = -1) -> bytes:
        raise NotImplementedError

    def readexactly(self, num_bytes: int) -> bytes:
        buf = bytearray(num_bytes)
        pos = 0
        while pos < num_bytes:
            n = self._recv_into(memoryview(buf)[pos:])
            if n == 0:
                raise IncompleteReadError(bytes(buf[:pos]), num_bytes)
            pos += n
        return bytes(buf)

    def readline(self) -> bytes:
        return self.readuntil(b"\n")

    def readuntil(self, separator: bytes = b"\n") -> bytes:
        if len(separator) != 1:
            raise ValueError("Only separators of length 1 are supported.")

        chunk = bytearray(4096)
        start = 0
        buf = bytearray(len(self._recv_buffer))
        bytes_read = self._recv_into(memoryview(buf))
        assert bytes_read == len(buf)

        while True:
            idx = buf.find(separator, start)
            if idx != -1:
                break

            start = len(self._recv_buffer)
            bytes_read = self._recv_into(memoryview(chunk))
            buf += memoryview(chunk)[:bytes_read]

        result = bytes(buf[: idx + 1])
        self._recv_buffer = b"".join(
            (memoryview(buf)[idx + 1 :], self._recv_buffer)
        )
        return result

    def _recv_into(self, view: memoryview) -> int:
        bytes_read = min(len(view), len(self._recv_buffer))
        view[:bytes_read] = self._recv_buffer[:bytes_read]
        self._recv_buffer = self._recv_buffer[bytes_read:]
        if bytes_read == len(view):
            return bytes_read
        bytes_read += self._sock.recv_into(view[bytes_read:])
        return bytes_read

Usage:用法:

reader = SocketStreamReader(sock)
line = reader.readline()

Here is my solution written in Python 3. In the example I use io.BytesIO.read() instead of socket.recv() but the idea is the same这是我用 Python 3 编写的解决方案。在示例中我使用io.BytesIO.read()而不是socket.recv()但想法是一样的

CHUNK_SIZE = 16  # you can set it larger or smaller
buffer = bytearray()
while True:
  chunk = stream.read(CHUNK_SIZE)
  buffer.extend(chunk)
  if b'\n' in chunk or not chunk:
    break
firstline = buffer[:buffer.find(b'\n')]

However, the rest of the message is partially in the buffer and partially waiting in the socket.但是,消息的其余部分部分在缓冲区中,部分在套接字中等待。 You can either keep writing the content into the buffer and read from the buffer to have the entire request in one piece (it should be fine unless you parse a huge requests) or you can wrap it with a generator and read it part by part您可以继续将内容写入缓冲区并从缓冲区读取以将整个请求合并为一个(除非您解析大量请求,否则应该没问题),或者您可以用生成器包装它并部分读取它

def reader(buffer, stream):
  yield buffer[buffer.find(b'\n') + 1:]
  while True:
    chunk = stream.read(2048)
    if not chunk: break
    yield chunk

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM