简体   繁体   中英

Proxy server - wrong encoding

I'm trying to write simple proxy server (that handles GET request). I've wrote following code:

public void handle(Socket socket) throws IOException, URISyntaxException {

    /* CLIENT -> SERVER */
    Scanner clientInputScanner = new Scanner(socket.getInputStream());

    List<String> clientHeaders = new ArrayList<String>();

    String line;
    String targetUrl = null;

    boolean firstLine = true;

    while ((line = clientInputScanner.nextLine()) != null) {

        if (line.length() <= 0) {
            break;
        }

        if (firstLine) {

            String[] tokens = line.split(" ");
            targetUrl = tokens[1];

            line = tokens[0] + " " + this.extractPath(tokens[1]) + " " + tokens[2];

            firstLine = false;
        }

        clientHeaders.add(line);
    }


    Socket server = new Socket(this.extractHostName(targetUrl), 80);
    PrintWriter serverPrint = new PrintWriter(server.getOutputStream());

    for (String header: clientHeaders) {
        serverPrint.println(header);
    }

    serverPrint.println("");
    serverPrint.flush();

    /* SERVER -> CLIENT */
    Scanner serverScanner = new Scanner(server.getInputStream());
    PrintWriter clientPrinter = new PrintWriter(socket.getOutputStream());

    List<String> serverHeaders = new ArrayList<String>();
    int serverContentLength = 0;

    while ((line = serverScanner.nextLine()) != null) {

        if (line.length() <= 0) {
            break;
        }

        serverHeaders.add(line);

        if (line.startsWith("Content-Length: ")) {
            // content-length
            int index = line.indexOf(':') + 1;
            String len = line.substring(index).trim();
            serverContentLength = Integer.parseInt(len);
        }
    }

    for (String header: serverHeaders) {
        clientPrinter.println(header);
    }

    clientPrinter.println("");
    clientPrinter.flush();

    if (serverContentLength > 0) {

        InputStream serverReader = server.getInputStream();
        OutputStream clientWriter = socket.getOutputStream();

        byte[] buff = new byte[1024];
        int bytesRead;
        int count = 0;

        while ((bytesRead = serverReader.read(buff)) != -1) {

            if (count == serverContentLength) {
                break;
            }

            clientWriter.write(buff, 0, bytesRead);
            clientWriter.flush();
            count += bytesRead;
        }

        clientWriter.close();
        serverReader.close();
    }

    clientInputScanner.close();
}

The problem is encoding - webbrowser cannot understand body request (it shows strange characters) . I'm passing raw bytes (without interpreting it as chars) so don't know what can be wrong. Content-Type header is passed properly (with good encoding)

NOTE: it's simply code for POC only, I need only get it to work. So, code style is ugly :)

The Scanner will read an entire buffer worth of data. It will not stop reading at the end of the current line. So the Scanner will have read data from your body already - the moment you do InputStream serverReader = server.getInputStream(); , some (or all) data from the body was already consumed by the Scanner.

You will have to stick to one class that reads from the socket, and since you want to read binary data, that will have to be a plain InputStream . BufferedReader and Scanner can't be used because they will read their buffer beyond the end of the line

You can implement your own readLine method on InputStream - as long as you stop reading when you've seen the end of the line, the rest of the data will still be there for you to consume as part of the body.

This may or may not explain the strange characters - we'd need to know what data you are sending and how you're viewing the data to be sure.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM