简体   繁体   English

使用tcp从PHP发送到JAVA后,数据已损坏

[英]Data is corrupted after sending from PHP to JAVA using tcp

I am trying to send data from PHP TCP server to JAVA TCP client. 我正在尝试将数据从PHP TCP服务器发送到JAVA TCP客户端。 I am comparing my results by comparing hex values of the data. 我正在通过比较数据的十六进制值来比较我的结果。

PHP script reads STDIN, sends it through socket one byte at a time and java reads it using DataInputStream.read(), converts to hex and displays. PHP脚本读取STDIN,通过套接字一次将其发送一个字节,而Java使用DataInputStream.read()读取它,转换为十六进制并显示。

If I manually type data into script - it works ok. 如果我手动在脚本中输入数据-可以。 If I use file with data - it works OK But when I assign /dev/urandom(even few bytes) - the data on the java side is coming corrupted. 如果我将文件与数据一起使用-可以正常工作,但是当我分配/ dev / urandom(甚至几个字节)时-Java端的数据已损坏。 There is always a hex of value efbfbd in random places instead of correct data. 随机位置总是有十六进制值efbfbd而不是正确的数据。 Please help me with this issue. 请帮我解决这个问题。 PHP code: PHP代码:

$f = fopen( 'php://stdin', 'rb' );
while($line = fread($f, 1)){

    $length = 1;
    echo bin2hex($line)."\n";
    echo socket_write($client, $line, 1)."\n";

     $sent = socket_write($client, $line, $length);

if ($sent === false) {

    break;
}

// Check if the entire message has been sented
if ($sent < $length) {

    // If not sent the entire message.
    // Get the part of the message that has not yet been sented as message
    $line = substr($line, $sent);

    // Get the length of the not sented part
    $length -= $sent;

}

Java code: Java代码:

in = new DataInputStream(clientSocket.getInputStream());

            byte[] data = new byte[1];

            int count = 0;
            while(in.available() > 0){
                //System.out.println(in.available());
                     in.read(data);
                String message = new String(data);

                System.out.println(message);
                //System.out.flush();

                System.out.println( toHex(message) );
                //in.flush();
                message = "";



            }

You're stumbling upon encoding. 您正在尝试编码。 By calling new String(data) the byte array is converted using your default encoding to a string, whatever this encoding may is (you can set the encoding by java -Dfile.encoding=UTF-8 to UTF-8 for example). 通过调用new String(data) ,无论使用哪种编码方式,字节数组都将使用默认编码转换为字符串(例如,可以通过java -Dfile.encoding=UTF-8将编码设置为UTF-8 )。

The Java code you want would most likely look the following: 您想要的Java代码很可能看起来如下所示:

    in = new DataInputStream(clientSocket.getInputStream());

    byte[] data = new byte[1];

    int count = 0;
    while (in.available() > 0) {
        // System.out.println(in.available());
        in.read(data);
        String hexMessage = Integer.toHexString(data[0] & 0xFF);
        String stringMessage = new String(data, "UTF-8"); // US-ASCII, ISO-8859-1, ...
        System.out.println(hexMessage);
    }

Update: I missed the 32bit issue. 更新:我错过了32位问题。 The 8-bit byte , which is signed in Java, is sign-extended to a 32-bit int . 用Java签名的8位byte被符号扩展为32位int To effectively undo this sign extension, one can mask the byte with 0xFF . 要有效撤消此符号扩展,可以使用0xFF屏蔽byte

There are two main issues with your Java program. Java程序存在两个主要问题。

First - the use of in.available() . 首先-使用in.available() It does not tell you how many bytes there are still in the message. 它不会告诉您消息中还有多少字节。 It merely says how many bytes are ready in the stream and for available reading without being blocked. 它只是说出流中准备好多少字节并可供读取而不会被阻塞。 For example, if the server sends two packets of data over the socket, one has arrived, but one is still being sent over the Internet, and each packet has 200 bytes (this is just an example), then in the first call you'll get the answer 200 . 例如,如果服务器通过套接字发送了两个数据包,一个已经到达,但是一个仍在通过Internet发送,每个数据包有200个字节(这只是一个例子),那么在第一个调用中,会得到答案200 If you read 200 bytes, you're sure not to be blocked. 如果读取200个字节,则确保不会被阻止。 But if the second packet has not arrived yet, your next check of in.available() will return 0 . 但是,如果第二个数据包尚未到达,则您对in.available()下一次检查将返回0 If you stop at this point, you only have half the data. 如果此时停止,则只有一半的数据。 Not what you wanted. 不是您想要的。

Typically you either have to read until you reach end-of-stream ( InputStream.read() returns -1), and then you can't use the same stream anymore and you close the socket, or you have a specific protocol that tells you how many bytes to expect and you read that number of bytes. 通常,您要么必须阅读直到到达流末尾( InputStream.read()返回-1),然后您就不能再使用相同的流并关闭套接字了,或者您有一个特定的协议告诉您您期望多少个字节,然后读取该字节数。


But that's not the reason for the strange values you see in output from your program. 但这不是在程序输出中看到奇怪值的原因。 The reason is that Java and PHP represent strings completely differently. 原因是Java和PHP表示字符串的方式完全不同。 In PHP, a string can contain any bytes at all, and the interpretation of them as characters is up to the prorgrammer. 在PHP中,字符串完全可以包含任何字节,并且将它们解释为字符取决于程序设计师。

This basically means that a PHP string is the equivalent of a byte[] in Java. 这基本上意味着,PHP字符串等效于Java中的byte[]

But Java Strings are completely different. 但是Java字符串完全不同。 It consists internally of an array of char , and char is always two bytes in UTF-16 encoding. 它在内部由char数组组成,并且char在UTF-16编码中始终为两个字节。 When you convert bytes you read into a Java String , it's always done by encoding the bytes using some character encoding so that the appropriate characters are stored in the string. 当您将字节转换为Java String ,总是通过使用某些字符编码对字节进行编码来完成的,以便将适当的字符存储在字符串中。

For example, if your bytes are 44 4F 4C 4C , and the character encoding is ISO-8859-1, this will be interpreted as the characters \D , \O , \L , \L . 例如,如果你的字节是44 4F 4C 4C ,和字符编码是ISO-8859-1,这将被解释为字符\D\O\L\L It will be a string of four characters - "DOLL" . 这将是一个包含四个字符的字符串- "DOLL" But if your character encoding is UTF-16 , the bytes will be interpreted as \䑏 and \䱌 . 但是,如果您的字符编码为UTF-16 ,则字节将被解释为\䑏\䱌 A string of only two characters, "䑏䱌" . 只有两个字符的字符串"䑏䱌"

When you were reading from the console or from a file, the data was probably in the encoding that Java expects by default. 从控制台或文件中读取数据时,默认情况下,数据可能采用Java期望的编码。 This is usually the case when the file is written in pure English, with just English letters, spaces and punctuation. 通常情况下,文件是用纯英文书写的,只有英文字母,空格和标点符号。 These are all 7-bit characters which are the same in ISO-8859-1 and UTF-8, which are the common defaults. 这些都是7位字符,在ISO-8859-1和UTF-8中是相同的,这是常见的默认设置。 But in /dev/urandom you'd have some bytes in the range 80 through FF , which may be treated differently when interpreted into a UTF-16 Java string. 但是在/dev/urandom您会有一些字节,范围在80FF ,当解释为UTF-16 Java字符串时,可能会有所不同。

Furthermore, you didn't show your toHex() method in Java. 此外,您没有在Java中显示toHex()方法。 It probably reads bytes back from the string again, but using which encoding? 它可能会再次从字符串中读取字节,但是使用哪种编码? If you read the bytes into the String using ISO-8859-1 , and got them out in UTF-8 , you'd get completely different bytes. 如果您使用ISO-8859-1将字节读取到String ,并以UTF-8读取它们,则将获得完全不同的字节。

If you want to see exactly what PHP sent you, don't put the bytes in a String . 如果您想确切地了解PHP发送给您的内容,请不要将字节放在String Write a toHex method that works on byte arrays, and use the byte[] you read directly. 编写一个适用于字节数组的toHex方法,并使用直接读取的byte[]


Also, always remember to check the number of bytes returned by read() and only interpret that number of bytes! 另外,请始终记住检查read()返回的字节数,并仅解释该字节数! read() does not always fill the entire array. read() 并不总是填充整个阵列。 So in your new toHex() method, you need to also pass the number of bytes read as a parameter, so that it doesn't display the parts of the array after them. 因此,在新的toHex()方法中,您还需要将读取的字节数作为参数传递,这样它就不会在它们后面显示数组的各个部分。 In your case you just have a one-byte array - which is not recommended - but even in this case, read() can return 0, and it's a perfectly legal value indicating that in this particular call to read() there were no bytes available although there may be some available in the next read() . 在您的情况下,您只有一个一字节的数组-不建议这样做-但即使在这种情况下, read()可以返回0,这是一个完全合法的值,表明在对read()特定调用中没有字节可用,尽管下一个read()可能有一些可用。

As the comment above says you might be having troubles with the string representation of the bytes String message = new String(data); 正如上面的评论所述,您可能在字节的字符串表示形式方面遇到了麻烦String message = new String(data); To be certain, you should get the data bytes and encode them in Base64 for example. 可以肯定的是,您应该获取数据字节并将其编码为例如Base64。 You can use a library such as Apache Commons or Java 8 to do that. 您可以使用诸如Apache CommonsJava 8之类的库来实现。 You should be able to do something similar in PHP to compare. 您应该能够在PHP中进行类似的操作以进行比较。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM