简体   繁体   English

在Python中由对等[Errno 104]重置连接

[英]Connection reset by peer [Errno 104] in Python

I have witten a distributed program. 我有一个分布式程序。 Every node (Virtual machine) in the network sends data (through outgoing connection) to and receives data (through incomming connection) from every other node. 网络中的每个节点(虚拟机)(通过传出连接)向每个其他节点发送数据(并通过传入连接)接收数据。 Before sending data, all nodes has opend a socket to every other node (including the single source node). 在发送数据之前,所有节点都已打开到其他每个节点(包括单个源节点)的套接字。 After a delay of 3 seconds the source starts sending a different file chunk to each of other nodes in the network. 延迟3秒后,源开始向网络中的每个其他节点发送不同的文件块。 Every node starts forwarding the receiveing chunk after arrival of the first packet. 每个节点在第一个数据包到达后开始转发接收块。

The programs finishes successfully for multiple times without any error. 程序成功完成多次,没有任何错误。 But, sometimes one random node reset the incomming connections (while still sends data through its outgoing connections). 但是,有时一个随机节点会重置传入连接(同时仍通过其传出连接发送数据)。

Each node has both n-2 sender threads and n-1 receiver threads. 每个节点都具有n-2个发送器线程和n-1个接收器线程。

Sending Function: 发送功能:

def relaySegment_Parallel(self):
        connectionInfoList = []
        seenSegments = []
        readyServers = []
        BUFFER_SIZE = Node.bufferSize
        while len(readyServers) < self.connectingPeersNum-len(Node.sources) and self.isMainThreadActive(): #Data won't be relayed to the sources
            try:
                tempIp = None
                for ip in Node.IPAddresses:
                    if ip not in readyServers and ip != self.ip and ip not in self.getSourcesIp():
                        tempIp = ip
                        s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
                        s.connect((ip, Node.dataPort))
                        connectionInfoList.append((s, ip))
                        readyServers.append(ip)
                        if Node.debugLevel2Enable:
                            print "RelayHandler: Outgoing connection established with IP:  " + str(ip)
            except socket.error, v:
                errorcode = v[0]
                if  errorcode == errno.ECONNRESET:
                    print "(RelayHandler) Connection reset ! Node's IP: " + str(tempIp)
                if errorcode == errno.ECONNREFUSED:
                    print "(RelayHandler) Node " + str(tempIp) + " are not ready yet!"
                continue
            except:
                print "Error: Cannot connect to IP: " + str (tempIp)
                continue
            print "(RelayHandler) Ready to relay data to " + str(len(readyServers)) + " numeber of servers."
        try:
            pool = ThreadPool(processes = Node.threadPoolSize)
            while Node.terminateFlag == 0 and not self.isDistributionDone() and self.isMainThreadActive():
                if len(self.toSendTupleList) > 0:
                    self.toSendLock.acquire()
                    segmentNo, segmentSize, segmentStartingOffset, data = self.toSendTupleList.pop(0)
                    self.toSendLock.release()
                    if len(data) > 0:
                        if segmentNo not in seenSegments:
                            #Type: 0 = From Sourece , 1 = From Rlayer
                            #Sender Type/Segment No./Segment Size/Segment Starting Offset/
                            tempList = []
                            for s, ip in connectionInfoList:
                                tempData = "1/" + str(self.fileSize) + "/"  + str(segmentNo) + "/" + str(segmentSize) + "/" + str(segmentStartingOffset) + "/"
                                tempList.append((s, ip, tempData))
                            pool.map(self.relayWorker, tempList)
                            seenSegments.append(segmentNo)
                        relayList = []
                        for s, ip in connectionInfoList:
                            relayList.append((s, ip, data))
                        pool.map(self.relayWorker, relayList)
            for s, ip in connectionInfoList:
                s.shutdown(1)# 0:Further receives are disallowed -- 1: Further  sends are disallow / sends -- 2: Further sends and receives are disallowed.
                s.close()
            pool.close()
            pool.join()
        except socket.error, v:
            errorcode=v[0]
            if errorcode==errno.ECONNREFUSED:
                print "(RelayHandler) Error: Connection Refused in RelaySegment function. It can not connect to: ", ip
            else:
                print "\n(RelayHandler) Error1 in relaying segments (Parallel) to ", ip, " !!! ErrorCode: ", errorcode
            traceback.print_exception(*sys.exc_info())
        except:
            print "\n(RelayHandler) Error2 in relaying segments (Parallel) to ", ip
            traceback.print_exception(*sys.exc_info())

Receiving Function: 接收功能:

def receiveDataHandler(self):
        s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        try:
            s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)# Allows us to resue the port immediately after termination of the program
            s.bind((self.ip, Node.dataPort))
            s.listen(Node.MaxNumClientListenedTo)
            threadsList = []
            fHandler = fileHandler(self.inFileAddr, Node.bufferSize)
            isStart = False
            executionTime = 0
            connectedPeersSofar = 0
            while (not self.connectingPeersNum == connectedPeersSofar)  and  self.isMainThreadActive() and Node.terminateFlag == 0 and not self.isReceptionDone():
                conn, ipAddr = s.accept()
                thread_receiveData = Thread2(target = self.receiveData_Serial, args = (conn, ipAddr, fHandler))
                thread_receiveData.start()
                if Node.debugLevel2Enable:
                    print 'Receive Handler: New thread started for connection from address:', ipAddr
                connectedPeersSofar += 1
                threadsList.append(thread_receiveData)
                if isStart == False:
                    isStart = True
            print "(RecieiverHandeler) Receiver stops listening: Peers Num "+str(self.connectingPeersNum) +i " connected peers so far: " + str(connectedPeersSofar)
            for i in range(0, len(threadsList)):
                self.startTime = threadsList[i].join()
            if isStart:
                executionTime = float(time.time()) - float(self.startTime)
            else:
                print "\n\t No Start! Execution Time: --- 0 seconds ---" , "\n"
            s.shutdown(2)# 0:Further receives are disallowed -- 1: Further  sends are disallow / sends -- 2: Further sends and receives are disallowed.
            s.close()
            return executionTime
        except socket.error, v:
            errorcode = v[0]
            if errorcode == 22: # 22: Invalid arument
                print "Error: Invalid argument in connection acceptance (receive data handler)"
            elif errorcode==errno.ECONNREFUSED:
                print "Error: Connection Refused in receive"
            else:
                print "Error1 in Data receive Handler !!! ErrorCode: ", errorcode
            traceback.print_exception(*sys.exc_info())
        except:
            print "Error2 in Data receive Handler !!!"
            traceback.print_exception(*sys.exc_info())

The Sending thread of all nodes prints that the node is connected to all other nodes (including the random malfunctioning node). 所有节点的发送线程将打印该节点已连接到所有其他节点(包括随机故障节点)。 However, the Receiving function of the random node waits on 但是,随机节点的接收功能会等待

s.accept() s.accept()

and does not accept any connection but the connection from the single source which is the last one to connect. 并且不接受任何连接,但接受来自最后一个要连接的单一来源的连接。 The random node just wait without raising any exception. 随机节点只是等待而不会引发任何异常。

It seems that 看起来

s.listen() s.listen()

(TCP protocole) of the random node makes the senders think that they are connected, while (TCP protocole)的随机节点使发送者认为他们已连接,而

s.accept() s.accept()

does not accept any one but the last one. 除了最后一个,不接受任何一个。 Then, for some reason it resets the conneciton, and that is why others (senders) raise the "Connection reset by peer" exception when they try to send data. 然后,由于某种原因,它会重置连接,这就是为什么其他人(发送方)在尝试发送数据时引发“对等方重置连接”异常的原因。 The only sender that finishes its job without any error is the sources which is the last one to connect. 唯一完成其工作且没有任何错误的发送者是源,它是最后一个连接的源。

Error: 错误:

Traceback (most recent call last):
File "/home/ubuntu/DCDataDistribution/Node.py", line 137, in relayWorker
socketConn.sendall(data)
File "/usr/lib/python2.7/socket.py", line 224, in meth
return getattr(self._sock,name)(*args)
error: [Errno 104] Connection reset by peer

Why is that happening? 为什么会这样呢?

FYI: I am running my program on Amazon EC2 instances. 仅供参考:我在Amazon EC2实例上运行我的程序。 The type of each instance is t2.micro (1 vCPU, 2.5 GHz, Intel Xeon Family (Up to 3.3 GHz) and, 1 GiB memory). 每个实例的类型为t2.micro(1个vCPU,2.5 GHz,Intel Xeon家族(最高3.3 GHz)和1 GiB内存)。 The Ubuntu Server 14.04 LTS (HVM) is running on every instances. Ubuntu Server 14.04 LTS(HVM)在每个实例上运行。

            for s, ip in connectionInfoList:
                s.shutdown(1)# 0:Further receives are disallowed -- 1: Further  sends are disallow / sends -- 2: Further sends and receives are disallowed.
                s.close()
            pool.close()
            pool.join()

You shutdown the connections while some relayWorker thread in the pool may still be unfinished. shutdown连接时, pool某些relayWorker线程可能仍未完成。 Reverse the order: 颠倒顺序:

                pool.close()
                pool.join()
                for s, ip in connectionInfoList:
                    s.close()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM