简体繁体 English

大量数据的Python TCP套接字

[英]Python TCP socket for a lot of data

原文 2015-11-25 17:25:21 8 2 python/ sockets/ tcp

We (as project group) are currently stuck on the issue of how to handle live data to our server.我们（作为项目组）目前被困在如何处理我们服务器的实时数据的问题上。

We are getting updates on data every second, and we would like to insert this into our database (security is currently not an issue, because it is a school project).我们每秒钟都在更新数据，我们希望将其插入到我们的数据库中（安全性目前不是问题，因为它是一个学校项目）。 The problem is here we tried python SockerServer and AsyncIO to create a TCP server to which the data can be sent.问题是在这里我们尝试了 python SockerServer 和 AsyncIO 来创建一个可以将数据发送到的 TCP 服务器。

We got this working with different libraries etc. But we are stuck on the fact that if we keep an open connection with the client (in this case hardware which sends data every second) we can't split the different JSON or XML messages.我们使用不同的库等实现了这一点。但我们坚持这样一个事实，即如果我们与客户端保持开放连接（在这种情况下，硬件每秒发送数据）我们无法拆分不同的 JSON 或 XML 消息。 They are all added up together.它们都加在一起。

We know why because TCP only provides order.我们知道为什么，因为 TCP 只提供顺序。

Any thoughts on how to handle this?关于如何处理这个的任何想法？ So that every message sent will get split from the others.这样发送的每条消息都会与其他消息分开。

Recreating the socket won't be the right option if I recall correctly.如果我没记错的话，重新创建套接字将不是正确的选择。

2 个解决方案

What you will have to do is ensure that there is a clear delimiter for each message.您需要做的是确保每条消息都有明确的分隔符。 For example, the first 6 characters of every message could be the length of the message - whatever reads from the socket decodes the length then reads that number of bytes, and sends the data to whatever needs it.例如，每条消息的前 6 个字符可能是消息的长度 - 从套接字读取的任何内容都会对长度进行解码，然后读取该字节数，并将数据发送到任何需要它的地方。 Another way would be if there is a character/byte which never appears in the content, send it immediately before a message - for example control-A (binary value 1) could be the leadin character, and send control-B (binary value 2) as the leadout.另一种方法是，如果有一个从未出现在内容中的字符/字节，则在消息之前立即发送它 - 例如 control-A（二进制值 1）可能是前导字符，然后发送 control-B（二进制值 2） ) 作为引出线。 Again the server looks for these framing a message.服务器再次查找这些框架消息。

If you can't change the client side (the thing sending the data), then you are going to have to parse the input.如果你不能改变客户端（发送数据的东西），那么你将不得不解析输入。 You can't just add a delimiter to something that you don't control.您不能只为无法控制的内容添加分隔符。

An alternative is to use a header that encodes the size of the message that will be sent.另一种方法是使用对将要发送的消息的大小进行编码的标头。 Lets say you use a header of 4 bytes, The client first send the server a header with the size of the message to come.假设您使用 4 个字节的标头，客户端首先向服务器发送一个标头，其中包含要发送的消息的大小。 The client then sends the message (up to 4 gigs or there about).然后客户端发送消息（最多 4 个演出或大约）。 The server knows that it must first read 4 bytes (a header).服务器知道它必须首先读取 4 个字节（一个标头）。 It calculates the size n that the header contained then reads n bytes from the socket buffer.它计算头包含的大小n ，然后从套接字缓冲区读取n个字节。 You are guaranteed to have read only your message.保证您只阅读了您的消息。 Using special delimiters is dangerous as you MUST know all possible values that a client can send.使用特殊分隔符是危险的，因为您必须知道客户端可以发送的所有可能值。

It really depends on the type of data you are receiving.这实际上取决于您接收的数据类型。 What type of connection, latency... If you have a pause of 1 second between packets and your connection is consistent, you could probably get away with first reading the entire buffer once to clear it, then as soon as there is data available - read it and clear the buffer it.什么类型的连接，延迟......如果你在数据包之间有 1 秒的暂停并且你的连接是一致的，你可能可以先读取整个缓冲区一次以清除它，然后一旦有数据可用 -读取它并清除缓冲区。 not a great approach, but it might work for what you need - and no parsing involved.不是一个很好的方法，但它可能适合您的需要 - 并且不涉及解析。