简体   繁体   English

通过 TCP/IP 套接字发送大数据

[英]Sending large data over TCP/IP socket

I have a small project running a server in C# and a client in Java .我有一个小项目在Java中运行服务器,在C#中运行客户端。 The server sends images to the client.服务器将图像发送到客户端。 Some images are quite big (up to 10MiB sometimes), so I split the image bytes and send it in chunks of 32768 bytes each.有些图像非常大(有时高达 10MiB),所以我将图像字节拆分并以每个32768 bytes的块发送。 My C# Server code is as follows:我的 C# 服务器代码如下:

using (var stream = new MemoryStream(ImageData))
{
   for (int j = 1; j <= dataSplitParameters.NumberOfChunks; j++)
   {
      byte[] chunk;
      if (j == dataSplitParameters.NumberOfChunks)
         chunk = new byte[dataSplitParameters.FinalChunkSize];
      else
         chunk = new byte[dataSplitParameters.ChunkSize];

      int result = stream.Read(chunk, 0, chunk.Length);

      string line = DateTime.Now + ", Status OK, " + ImageName+ ", ImageChunk, " + j + ", " + dataSplitParameters.NumberOfChunks + ", " + chunk.Length;

      //write read params
      streamWriter.WriteLine(line);
      streamWriter.Flush();
      
      //write the data
      binaryWriter.Write(chunk);
      binaryWriter.Flush();
      Console.WriteLine(line);

      string deliveryReport = streamReader.ReadLine();
      Console.WriteLine(deliveryReport);
     }
  }

And my Java Client code is as follows:而我的Java客户端代码如下:

long dataRead = 0;
for (int j = 1; j <= numberOfChunks; j++) {
    String line = bufferedReader.readLine();
    tokens = line.split(", ");
    System.out.println(line);

    int toRead = Integer.parseInt(tokens[tokens.length - 1]);
    byte[] chunk = new byte[toRead];
    int read = inputStream.read(chunk, 0, toRead);
    //do something with the data
    dataRead += read;

    String progressReport = pageLabel + ", progress: " + dataRead + "/" + dataLength + " bytes.";
    bufferedOutputStream.write((progressReport + "\n").getBytes());
    bufferedOutputStream.flush();

    System.out.println(progressReport);
}

The problem is when I run the code, either the client crashes with an error saying it is reading bogus data, or both the client and the server hang.问题是当我运行代码时,客户端崩溃并显示正在读取虚假数据的错误,或者客户端和服务器都挂起。 This is the error:这是错误:

Document Page 1, progress: 49153/226604 bytes.
�9��%>�YI!��F�����h�
Exception in thread "main" java.lang.NumberFormatException: For input string: .....

What am I doing wrong?我究竟做错了什么?

The basic problem.基本问题。

Once you wrap an inputstream into a bufferedreader you must stop accessing the inputstream .将输入流包装到缓冲读取器后,您必须停止访问输入流 That bufferedreader is buffered , it will read as much data as it wants to, it is NOT limited to reading exactly up to the next newline symbol(s) and stopping there.bufferedreader缓冲的,它将读取尽可能多的数据,它不限于精确读取到下一个换行符并停在那里。

The BufferedReader on the java side has read a lot more than that, so it's consumed a whole bunch of image data already, and there's no way out from here. java 端的 BufferedReader 已经读取了很多,所以它已经消耗了一大堆图像数据,并且没有出路。 By making that BufferedReader, you've made the job impossible, so you can't do that.通过制作 BufferedReader,你使这项工作变得不可能,所以你不能那样做。

The underlying problem.根本问题。

You have a single TCP/IP connection.您有一个 TCP/IP 连接。 On this, you send some irrelevant text (the page, the progress, etc), and then you send an unknown amount of image data, and then you send another irrelevant progress update.在此,您发送一些不相关的文本(页面、进度等),然后发送未知数量的图像数据,然后发送另一个不相关的进度更新。

That's fundamentally broken.这从根本上被打破了。 How can an image parser possibly know that halfway through sending an image, you get a status update line?图像解析器怎么可能知道在发送图像的中途,您会收到一条状态更新行? Text is just binary data too, there is no magic identifier that lets a client know: This byte is part of the image data, but this byte is some text sent in-between with progress info.文本也只是二进制数据,没有让客户端知道的神奇标识符:这个字节是图像数据的一部分,但这个字节是一些中间发送的带有进度信息的文本。

The simple fix.简单的修复。

You'd think the simple fix is.. well, stop doing that then?你会认为简单的解决方法是......好吧,那么停止这样做? Why are you sending this progress, The client is perfectly capable of knowing how many bytes it read.你为什么要发送这个进度,客户端完全有能力知道它读取了多少字节。 there is no point sending that.发送那个是没有意义的。 Just.. take your binary data.只需.. 获取您的二进制数据。 open the outputstream.打开输出流。 send all that data, And on the client side, open the inputstream.发送所有数据,然后在客户端打开输入流。 read all that data.读取所有数据。 Don't involve strings, Don't use anything that smacks of 'works with characters' (so? BufferedReader. No. BufferedInputStream is fine).不要涉及字符串,不要使用任何带有“与字符一起使用”的东西(所以?BufferedReader。不,BufferedInputStream 很好)。

... but now the client doesn't know the title, nor the total size! ...但现在客户不知道标题,也不知道总大小!

So make a wire protocol.所以制作一个有线协议。 It can be near trivial.这可能是微不足道的。

This is your wire protocol:这是您的有线协议:

  1. 4 bytes, big endian: SizeOfName 4 字节,大端:SizeOfName
  2. SizeOfName number of bytes. SizeOfName 字节数。 UTF-8 encoded document title. UTF-8 编码的文档标题。
  3. 4 bytes, big endian: SizeOfData 4 字节,大端:SizeOfData
  4. SizeOfData number of bytes. SizeOfData 字节数。 The image data.图像数据。

And that's if you actually want the client to be able to render a progress bar and to know the title.那是如果您真的希望客户端能够呈现进度条并知道标题。 If that's not needed, don't do any of that, just straight up send the bytes, and signal that the file has been completely sent by.. closing the connection.如果不需要,请不要这样做,只需直接发送字节,并通过关闭连接发出文件已完全发送的信号。

Here's some sample java code:这是一些示例 java 代码:

try (InputStream in = ....) {
  int nameSize = readInt(in);
  byte[] nameBytes = in.readNBytes(nameSize);
  String name = new String(nameBytes, StandardCharsets.UTF_8);
  int dataSize = readInt(in);
  try (OutputStream out = 
    Files.newOutputStream(Paths.get("/Users/TriSky/image.png")) {

    byte[] buffer = new byte[65536];
    while (dataSize > 0) {
      int r = in.read(buffer);
      if (r == -1) throw new IOException("Early end-of-stream");
      out.write(buffer, 0, r);
      dataSize -= r;
    }
  }
}

public int readInt(InputStream in) throws IOException {
    byte[] b = in.readNBytes(4);
    return ByteBuffer.wrap(b).getInt();
}

Closing notes结束语

Another bug in your app is that you're using the wrong method.您的应用程序中的另一个错误是您使用了错误的方法。 Java's 'read(bytes)' method will NOT (neccessarily) fully fill that byte array. Java 的“读取(字节)”方法不会(必然)完全填充该字节数组。 All read(byte[]) will do is read at least 1 byte (unless the stream is closed, then it reads none, and returns -1. The idea is: read will read the optimal number of bytes: Exactly as many as are ready to give you right now. How many is that? Who knows - if you ignore the returned value of in.read(bytes), your code is neccessarily broken, and you're doing just that. What you really want is for example readNBytes which guarantees that it fully fills that byte array (or until stream ends, whichever happens first).所有 read(byte[]) 都会读取至少 1 个字节(除非 stream 已关闭,否则它不会读取,并返回 -1。想法是:读取将读取最佳字节数:与现在准备给你。那是多少?谁知道 - 如果你忽略 in.read(bytes) 的返回值,你的代码必然会被破坏,而你正在这样做。你真正想要的是例如readNBytes保证它完全填充该字节数组(或直到 stream 结束,以先发生者为准)。

Note that in the transfer code above, I also use the basic read, but here I don't ignore the return value.注意,在上面的传输代码中,我也使用了基本的读取,但是这里我并没有忽略返回值。

Your Java code seems to be using a BufferedReader .您的 Java 代码似乎正在使用BufferedReader It reads data into a buffer of its own, meaning it is no longer available in the underlying socket input stream - that's your first problem.它将数据读入自己的缓冲区,这意味着它在底层套接字输入 stream 中不再可用 - 这是您的第一个问题。 You have a second problem with how inputStream.read is used - it's not guaranteed to read all the bytes you ask for, you would have to put a loop around it.您对如何使用inputStream.read有第二个问题 - 不能保证读取您要求的所有字节,您必须在它周围放置一个循环。

This is not a particularly easy problem to solve.这不是一个特别容易解决的问题。 When you mix binary and text data in the same stream, it is difficult to read it back.当您在同一个 stream 中混合二进制和文本数据时,很难将其读回。 In Java, there is a class called DataInputStream that can help a little - it has a readLine method to read a line of text, and also methods to read binary data:在 Java 中,有一个名为DataInputStream的 class 可以提供一点帮助 - 它有一个readLine方法来读取一行文本,还有一个方法来读取二进制数据:

DataInputStream dataInput = new DataInputStream(inputStream);

for (int j = 1; j <= numberOfChunks; j++) {
    String line = dataInput.readLine();
    ...
    byte[] chunk = new byte[toRead];
    int read = dataInput.readFully(chunk);
    ...
}

DataInputStream has limitations: the readLine method is deprecated because it assumes the text is encoded in latin-1, and does not let you use a different text encoding. DataInputStream 有局限性:不推荐使用readLine方法,因为它假定文本以 latin-1 编码,并且不允许您使用不同的文本编码。 If you want to go further down this road you'll want to create a class of your own to read your stream format.如果您想进一步阅读 go,您需要创建自己的 class 来读取您的 stream 格式。

Some images are quite big (up to 10MiB sometimes), so I split the image bytes and send it in chunks of 32768 bytes each.有些图像非常大(有时高达 10MiB),所以我将图像字节拆分并以每个 32768 字节的块发送。

You know this is totally unnecessary right?你知道这完全没有必要吗? There is absolutely no problem sending multiple megabytes of data into a TCP socket, and streaming all of the data in on the receiving side.将数兆字节的数据发送到 TCP 套接字并在接收端流式传输所有数据绝对没有问题。

When you try to send image, you have to open the image as a normal file then substring the image into some chunks and every chunk change it into " base64encode " when you send and the client decode it because the image data is not normal data , so base64encode change this symbols to normal chars like AfHM65Hkgf7MM当您尝试发送图像时,您必须将图像作为普通文件打开,然后 substring 将图像分成一些块,每个块在您发送时将其更改为“ base64encode ”,客户端解码它,因为图像数据不是普通数据,所以base64encode将此符号更改为普通字符,如AfHM65Hkgf7MM

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM