How to best implement server and client in python?

Question

I have a large file that needs to be loaded into memory, then some operations can be done based on the user input. But I don't want to load the file into memory again and again whenever there is a user input.

A solution might be to load the data file by a process as the "server", and have another client process to query the server on behalf of the client.

I am wondering what the best client-server implementation for this is. I know that I could implement an HTTP server, but querying it needs to follow the HTTP protocol which has too much overhead (For my specific, the client only needs to send a string to the server, so all the HTTP headers are not needed.) A lighter solution is preferred. Also, both the client and the server are supposed to run in the same machine, so using memory is faster than using networking for sharing the information between the client and the server?

Actually, the server could just load the data into the memory as some python objects, if there is a way to access these python objects from the client, it should be also fine.

Could anybody offer some advice on the best solution to solve this problem? Thanks.

Answer 1

Okay, so based on the comments, the data is keyed by string and values are lists or dictionaries, and the client requests an object by string.

Unfortunately there's no safe, sane way to directly access that sort of data cross-process without some intermediate serialization/deserialization step. An obvious choice, safety concerns aside, is pickle ing them. msgpack is reasonable too.

As for the protocol, if tried-and-tested HTTP is too slow for you, for a simple request-response cycle like this, maybe just have the client send the key to retrieve, followed by a null character or a newline or whatnot, and the server directly reply with the serialized object, and then close the connection.

You might also want to consider simply storing the serialized data in a database, be it SQLite or something else.

EDIT: I decided to experiment a little. Here's a small, pretty naive asyncio + msgpack based server + client that does the trick:

server.py

import asyncio
import random
import msgpack
import time
from functools import lru_cache


def generate_dict(depth=6, min_keys=1, max_keys=10):
    d = {}
    for x in range(random.randint(min_keys, max_keys)):
        d[x] = (
            generate_dict(
                depth=depth - 1, min_keys=min_keys, max_keys=max_keys
            )
            if depth
            else "foo" * (x + 1)
        )
    return d


DATA = {f"{x}": generate_dict() for x in range(10)}


@lru_cache(maxsize=64)
def get_encoded_data(key):
    # TODO: this does not clear the cache upon DATA being mutated
    return msgpack.packb(DATA.get(key))


async def handle_message(reader, writer):
    t0 = time.time()
    data = await reader.read(256)
    key = data.decode()
    addr = writer.get_extra_info("peername")
    print(f"Sending key {key!r} to {addr!r}...", end="")
    value = get_encoded_data(key)
    print(f"{len(value)} bytes...", end="")
    writer.write(value)
    await writer.drain()
    writer.close()
    t1 = time.time()
    print(f"{t1 - t0} seconds.")


async def main():
    server = await asyncio.start_server(handle_message, "127.0.0.1", 8888)

    addr = server.sockets[0].getsockname()
    print(f"Serving on {addr}")

    async with server:
        await server.serve_forever()


asyncio.run(main())

client.py

import socket
import msgpack
import time


def get_key(key):
    t0 = time.time()
    s = socket.socket()
    s.connect(("127.0.0.1", 8888))
    s.sendall(str(key).encode())
    buf = []
    while True:
        chunk = s.recv(65535)
        if not chunk:
            break
        buf.append(chunk)
    val = msgpack.unpackb(b"".join(buf))
    t1 = time.time()
    print(key, (t1 - t0))
    return val


t0 = time.time()
n = 0
for i in range(10):
    for x in range(10):
        assert get_key(x)
        n += 1
t1 = time.time()
print("total", (t1 - t0), "/", n, ":", (t1 - t0) / n)

On my Mac,

it takes about 0.02814 seconds per message on the receiving end, for a single-consumer throughput of 35 requests per second.
it takes about 0.00241 seconds per message on the serving end, for a throughput of 413 requests per second.

(And as you can see from how the DATA is generated, the payloads can be quite large.)

Hope this helps.

How to best implement server and client in python?

Question

1 answers

solution1
1 2019-04-11 15:36:43

server.py

client.py

How to best implement server and client in python?

Question

1 answers

solution1 1 2019-04-11 15:36:43

server.py

client.py

solution1
1 2019-04-11 15:36:43