簡體   English   中英

在Python套接字中緩存HTTP GET REQUEST

[英]Cache a HTTP GET REQUEST in Python Sockets

我正在使用套接字制作代理服務器。 當請求的文件不在當前目錄(緩存)中時,我向原始服務器(即www)發出http get請求,並將其緩存以備后用。

我的代碼的問題是,每次從www獲取資源時,我都會對其進行緩存,但是文件的內容始終是“永久移動”的。

這樣便發生了:用戶通過在瀏覽器中輸入“ localhost:8080 / stackoverflow.com”來請求“ stackoverlflow.com”。 瀏覽器將正確返回頁面。 當用戶第二次在瀏覽器中輸入“ localhost:8080 / stackoverflow.com”時,瀏覽器將返回一個頁面,表明stackoverflow.com已永久移動。

這是執行http get請求和緩存的方法的代碼:

    @staticmethod
    def find_on_www(conn, requested_file):
        try:
            # Create a socket on the proxy server
            print 'Creating socket on proxy server'
            c = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

            host_name = requested_file.replace("www.","",1)
            print 'Host Name: ', host_name

            # Connect to the socket to port 80
            c.connect((host_name, 80))
            print 'Socket connected to port 80 of the host'

            # Create a temporary file on this socket and ask port 80
            # for the file requested by the client
            file_object = c.makefile('r', 0)
            file_object.write("GET " + "http://" + requested_file + " HTTP/1.0\n\n")

            # Read the response into buffer
            buff = file_object.readlines()

            # Create a new file in the cache for the requested file.
            # Also send the response in the buffer to client socket
            # and the corresponding file in the cache
            temp_file = open("./" + requested_file, "wb")
            for i in range(0, len(buff)):
                temp_file.write(buff[i])
                conn.send(buff[i])

            conn.close()

如果有人感興趣,這是我的其余代碼:

import socket       # Socket programming
import signal       # To shut down server on ctrl+c
import time         # Current time
import os           # To get the last-modified
import mimetypes    # To guess the type of requested file
import sys          # To exit the program
from threading import Thread


def generate_header_lines(code, modified, length, mimetype):
        """ Generates the header lines for the response message """
        h = ''

        if code == 200:
            # Append status code
            h = 'HTTP/1.1 200 OK\n'
            # Append the date

            # Append the name of the server
            h += 'Server: Proxy-Server-Thomas\n'
            # Append the date of the last modification to the file
            h += 'Last-Modified: ' + modified + '\n'

        elif code == 404:
            # Append the status code
            h = 'HTTP/1.1 404 Not Found\n'
            # Append the date
            h += 'Date: ' + time.strftime("%a, %d %b %Y %H:%M:%S", time.localtime()) + '\n'
            # Append the name of the web server
            h += 'Server: Web-Server-Thomas\n'

        # Append the length of the content
        h += 'Content-Length: ' + str(length) + '\n'
        # Append the type of the content
        h += 'Content-Type: ' + mimetype + '\n'
        # Append the connection closed - let the client know we close the connection
        h += 'Connection: close\n\n'

        return h


def get_mime_type(requested_file):
    # Get the file's mimetype and encoding
    try:
        (mimetype, encoding) = mimetypes.guess_type(requested_file, True)
        if not mimetype:
            print "Mimetype found: text/html"
            return 'text/html'
        else:
            print "Mimetype found: ", mimetype
            return mimetype

    except TypeError:
        print "Mimetype found: text/html"
        return 'text/html'


class WebServer:
    def __init__(self):
        """
        Constructor
        :return:
        """
        self.host = ''      # Host for the server
        self.port = 8000    # Port for the server

        # Create socket
        self.socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

    def start_server(self):
        """ Starts the server
        :return:
        """
        # Bind the socket to the host and port
        self.socket.bind((self.host, self.port))

        print "Connection started on ", self.port

        # Start the main loop of the server - start handling clients
        self.main_loop()

    @staticmethod
    def shutdown():
        """ Shuts down the server """
        try:
            s.socket.close()
        except Exception as e:
            print "Something went wrong closing the socket: ", e

    def main_loop(self):
        """Main loop of the server"""
        while True:
            # Start listening
            self.socket.listen(1)

            # Wait for a client to connect
            client_socket, client_address = self.socket.accept()

            # Wait for a request from the client
            data = client_socket.recv(1024)

            t = Thread(target=self.handle_request, args=(client_socket, data))
            t.start()

            # # Handle the request from the client
            # self.handle_request(client_socket, data)

    def handle_request(self, conn, data):
        """ Handles a request from the client """
        # Decode the data
        string = bytes.decode(data)

        # Split the request
        requested_file = string.split(' ')
        # Get the method that is requested
        request_method = requested_file[0]

        if request_method == 'GET':
            # Get the part of the request that contains the name
            requested_file = requested_file[1]
            # Get the name of the file from the request
            requested_file = requested_file[1:]

            print "Searching for: ", requested_file

            try:
                # Open the file
                file_handler = open(requested_file, 'rb')
                # Get the content of the file
                response_content = file_handler.read()
                # Close the handler
                file_handler.close()

                # Get information about the file from the OS
                file_info = os.stat(requested_file)
                # Extract the last modified time from the information
                time_modified = time.ctime(file_info[8])
                # Get the time modified in seconds
                modified_seconds = os.path.getctime(requested_file)

                print "Current time: ", time.time()
                print "Modified: ", time_modified

                if (float(time.time()) - float(modified_seconds)) > 120:  # more than 2 minutes
                    print "Time outdated!"
                    #self.find_on_www(conn, requested_file)

                # Get the file's mimetype and encoding
                mimetype = get_mime_type(requested_file)

                print "Mimetype = ", mimetype

                # Create the correct header lines
                response_headers = generate_header_lines(200, time_modified, len(response_content), mimetype)

                # Create the response to the request
                server_response = response_headers.encode() + response_content

                # Send the response back to the client
                conn.send(server_response)

                # Close the connection
                conn.close()

            except IOError:  # Couldn't find the file in the cache - Go find file on www
                print "Error: " + requested_file + " not found in cache!"
                self.find_on_www(conn, requested_file)

    @staticmethod
    def find_on_www(conn, requested_file):
        try:
            # Create a socket on the proxy server
            print 'Creating socket on proxy server'
            c = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

            host_name = requested_file.replace("www.","",1)
            print 'Host Name: ', host_name

            # Connect to the socket to port 80
            c.connect((host_name, 80))
            print 'Socket connected to port 80 of the host'

            # Create a temporary file on this socket and ask port 80
            # for the file requested by the client
            file_object = c.makefile('r', 0)
            file_object.write("GET " + "http://" + requested_file + " HTTP/1.0\n\n")

            # Read the response into buffer
            buff = file_object.readlines()

            # Create a new file in the cache for the requested file.
            # Also send the response in the buffer to client socket
            # and the corresponding file in the cache
            temp_file = open("./" + requested_file, "wb")
            for i in range(0, len(buff)):
                temp_file.write(buff[i])
                conn.send(buff[i])

            conn.close()

        except Exception as e:
            # Generate a body for the file - so we don't have an empty page
            response_content = "<html><body><p>Error 404: File not found</p></body></html>"

            # Generate the correct header lines
            response_headers = generate_header_lines(404, '', len(response_content), 'text/html')

             # Create the response to the request
            server_response = response_headers.encode() + response_content

            # Send the response back to the client
            conn.send(server_response)

            # Close the connection
            conn.close()


def shutdown_server(sig, dummy):
    """ Shuts down the server """

    # Shutdown the server
    s.shutdown()

    # exit the program
    sys.exit(1)

# Shut down on ctrl+c
signal.signal(signal.SIGINT, shutdown_server)

# Create a web server
s = WebServer()
# Start the server
s.start_server()

代碼的問題在於,如果您轉到的頁面返回的狀態代碼已移動301頁面,則會將其添加到頁眉中。 當您查看未存儲在緩存中的頁面時,您將代理服務器直接向客戶端復制的GET請求復制。 這將通知客戶端發出另一個GET請求,使它忽略您的代理服務器。

第二次您嘗試通過代理服務器請求頁面時,它將從緩存中檢索先前的請求。 該文件包含來自先前請求的標頭,該標頭正確包含重定向狀態代碼,但是您隨后將自己的狀態代碼200 ok添加到返回的消息中。 客戶端首先讀取此狀態代碼時,並沒有意識到您希望它再次發出請求以查找已重定向的頁面。 因此,它僅顯示告訴您該頁面已移動的頁面。

您需要做的是解析代理服務器必須查看Internet上的實際頁面時Web服務器返回的標頭。 然后根據這些服務器將正確的標頭返回給客戶端。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM