简体   繁体   English

Apache和Python线程奇怪的结果

[英]Apache and Python threading strange results

I'm writing some code that queries SteamAPI for data on servers and then compiles a larger list and outputs JSON data. 我正在编写一些代码,用于查询SteamAPI以获取服务器上的数据,然后编译更大的列表并输出JSON数据。

It does this by: 它通过以下方式实现:

  • requesting a list of all the servers Ip's (ip, port) 请求所有服务器列表Ip(ip,port)
  • then sending a request to the ip/port 然后向ip / port发送请求
  • transforming the results, 改变结果,
  • appending it to a master list 将其附加到主列表
  • formatting the final list using json.dumps() 使用json.dumps()格式化最终列表

It's written in python 2.7.6 它是用python 2.7.6编写的

#!/usr/bin/env python
# -*- coding: utf-8 -*-

# gets all the miscreated servers

import Queue
import json
import unicodedata
from threading import Thread

import pyfscache
import valve.source.a2s
import valve.source.master_server

cache_it = pyfscache.FSCache('./cache/', days=0, hours=4, minutes=30)

# The main queue object
q = Queue.LifoQueue()
# this is the list we append to
final_servers_list = []


def print_headers():
    print "Content-type: application/json\n"
    print ""


def get_master_server_list():
    msq = valve.source.master_server.MasterServerQuerier()
    servers = msq.find(appid="299740")
    return servers


def normalize(data):
    if type(data) is unicode:
        return unicodedata.normalize('NFKD', data).encode('ascii', 'ignore')
    else:
        return data


def get_single_server_data():
    while not q.empty():  # check that the queue isn't empty
        try:
            server_address = q.get()
            _server = valve.source.a2s.ServerQuerier(server_address)
            info = _server.get_info()

            try:
                server_time = info['server_tags'].split(';')[0][-5:]
            except:
                server_time = '00:00'
            try:
                players = info['server_tags'].split(';')[1]
            except:
                players = '0'
            try:
                whitelisted = info['server_tags'].split(';')[2]
            except:
                whitelisted = '0'

            final_servers_list.append({'name': normalize(info['server_name']),
                                       'mapName': normalize(info['map']),
                                       'ip': normalize(server_address[0]),
                                       'port': normalize(server_address[1]),
                                       'time': normalize(server_time),
                                       'players': normalize(players),
                                       'whiteListed': normalize(whitelisted),
                                       'maxPlayers': normalize(info['max_players']),
                                       'version': normalize(info['version'])})
            q.task_done()
        except:
            q.task_done()


@cache_it
def get_server_list():
    master_server_list = get_master_server_list()
    for server in master_server_list:
        q.put(server)
    for i in range(200):
        t1 = Thread(target=get_single_server_data)  # target is the above function
        t1.start()  # start the thread
    q.join()
    return final_servers_list


if __name__ == '__main__':
    print_headers()
    print json.dumps(get_server_list())

Now this code works fine on my local machine running a scotchbox vagrant lamp stack The python is the same version on the server/dev machine. 现在这个代码在运行scotchbox vagrant灯堆的本地机器上工作正常python是服务器/ dev机器上的相同版本。

I get what I expect on my machine I get about 500 servers back and all of the data exactly as I expect it. 我得到了我在机器上的预期,我得到了大约500台服务器,并且所有数据完全符合我的预期。

However when I run this on the webserver running apache2 it spits me back a list that is in the 10,000 range with many of the results 10-20 times in a list. 但是当我在运行apache2的web服务器上运行它时,它会向我发回一个列表,该列表在10,000范围内,其中许多结果在列表中有10-20次。 Even if I try to filter the results it's like some of the data is slightly different (because it was requested maybe a second later or a second time? and the server tags changed?) 即使我尝试过滤结果,也就像某些数据略有不同(因为它可能是一秒钟或第二次请求?服务器标签已更改?)

I assume this has something to do with threading and apache with python and for some reason it not having some sort of Lock file, but I can not for the life of me figure this out. 我认为这与使用python的线程和apache有关,并且由于某种原因它没有某种类型的Lock文件,但我不能为我的生活弄清楚这一点。 I thought I had it solved by doing a 我以为我做了一个就解决了

source /etc/apache2/envvars

Then running the script for an ssh terminal 然后运行ssh终端的脚本

and it started working but then the next time the cache expired and the code was run it gave the same results back to me. 并且它开始工作但是下次缓存过期并且代码运行时它给了我相同的结果。

Any suggestions would be greatly appreciated because I'm banging my head against a wall here. 任何建议都会非常感激,因为我在这里碰到了一堵墙。

As a side note when I run apache2 -V 当我运行apache2 -V时作为旁注

it spits out an error: 它吐出一个错误:

[Wed Feb 01 05:35:59.192112 2017] [core:warn] [pid 30832] AH00111: Config variable ${APACHE_LOCK_DIR} is not defined
[Wed Feb 01 05:35:59.192531 2017] [core:warn] [pid 30832] AH00111: Config variable ${APACHE_PID_FILE} is not defined
[Wed Feb 01 05:35:59.193300 2017] [core:warn] [pid 30832] AH00111: Config variable ${APACHE_LOG_DIR} is not defined
[Wed Feb 01 05:35:59.214298 2017] [core:warn] [pid 30832] AH00111: Config variable ${APACHE_RUN_DIR} is not defined
[Wed Feb 01 05:35:59.215112 2017] [core:warn] [pid 30832] AH00111: Config variable ${APACHE_LOG_DIR} is not defined
[Wed Feb 01 05:35:59.215499 2017] [core:warn] [pid 30832] AH00111: Config variable ${APACHE_LOG_DIR} is not defined
[Wed Feb 01 05:35:59.215708 2017] [core:warn] [pid 30832] AH00111: Config variable ${APACHE_LOG_DIR} is not defined
[Wed Feb 01 05:35:59.216057 2017] [core:warn] [pid 30832] AH00111: Config variable ${APACHE_LOG_DIR} is not defined
[Wed Feb 01 05:35:59.216272 2017] [core:warn] [pid 30832] AH00111: Config variable ${APACHE_LOG_DIR} is not defined
[Wed Feb 01 05:35:59.216595 2017] [core:warn] [pid 30832] AH00111: Config variable ${APACHE_LOG_DIR} is not defined
[Wed Feb 01 05:35:59.217080 2017] [core:warn] [pid 30832] AH00111: Config variable ${APACHE_LOG_DIR} is not defined
[Wed Feb 01 05:35:59.217475 2017] [core:warn] [pid 30832] AH00111: Config variable ${APACHE_LOG_DIR} is not defined
[Wed Feb 01 05:35:59.217812 2017] [core:warn] [pid 30832] AH00111: Config variable ${APACHE_LOG_DIR} is not defined
[Wed Feb 01 05:35:59.218115 2017] [core:warn] [pid 30832] AH00111: Config variable ${APACHE_LOG_DIR} is not defined
[Wed Feb 01 05:35:59.218369 2017] [core:warn] [pid 30832] AH00111: Config variable ${APACHE_LOG_DIR} is not defined
[Wed Feb 01 05:35:59.218657 2017] [core:warn] [pid 30832] AH00111: Config variable ${APACHE_LOG_DIR} is not defined
[Wed Feb 01 05:35:59.218885 2017] [core:warn] [pid 30832] AH00111: Config variable ${APACHE_LOG_DIR} is not defined
[Wed Feb 01 05:35:59.219117 2017] [core:warn] [pid 30832] AH00111: Config variable ${APACHE_LOG_DIR} is not defined
[Wed Feb 01 05:35:59.219348 2017] [core:warn] [pid 30832] AH00111: Config variable ${APACHE_LOG_DIR} is not defined
[Wed Feb 01 05:35:59.219631 2017] [core:warn] [pid 30832] AH00111: Config variable ${APACHE_LOG_DIR} is not defined
[Wed Feb 01 05:35:59.219845 2017] [core:warn] [pid 30832] AH00111: Config variable ${APACHE_LOG_DIR} is not defined
AH00526: Syntax error on line 75 of /etc/apache2/apache2.conf:
Invalid Mutex directory in argument file:${APACHE_LOCK_DIR}

It does this on both dev and production machine so I didn't think this was a major deal. 它在开发和生产机器上都这样做,所以我不认为这是一个重大交易。

Finally the apache config of the site. 最后是网站的apache配置。

<VirtualHost *:80>
        ServerName "servers.miscreatedgame.com"
        ServerAdmin "csprance@entradainteractive.com"
        DocumentRoot "/var/www/servers.com/_build"

        # serverpanel appollo
        <Directory "/var/www/servers.com/_build">
                AddHandler cgi-script .py
                Options +ExecCGI
        </Directory>



        ErrorLog ${APACHE_LOG_DIR}/servers-error.log
        CustomLog ${APACHE_LOG_DIR}/servers-access.log combined

</VirtualHost>

As @SergGr said, that definitely seems like a thread race condition between multiple requests. 正如@SergGr所说,这绝对好像是多个请求之间的线程竞争条件。 I would suggest trying to make your code reentrant. 我建议尝试让你的代码可以重入。 I would put the code that actually creates the Threads and constructs the whole server list in a separate process which then returns that to the process that handles the request from the user using ipc. 我会把实际创建Threads的代码放在一个单独的进程中构建整个服务器列表,然后将该进程返回给使用ipc处理用户请求的进程。

In what environment that script is actually run on the server? 在什么环境下脚本实际上在服务器上运行? Is it an independent script or is it run as a part of a web-server (Apache)? 它是一个独立的脚本还是作为Web服务器(Apache)的一部分运行? In the latter case, don't you happen to have several concurrent (HTTP) requests for the same data? 在后一种情况下,您是否碰巧对同一数据有多个并发(HTTP)请求? Your q = Queue.LifoQueue() seems to be a globally shared variable that any processing request has access to and so all concurrent requests will fill the same "queue" ( q ). 您的q = Queue.LifoQueue()似乎是一个全局共享变量,任何处理请求都可以访问,因此所有并发请求将填充相同的“队列”( q )。 This might be the reason why it happens randomly: it happens only when there are concurrent requests for this data. 这可能是它随机发生的原因:只有在存在对此数据的并发请求时才会发生这种情况。 If this is the case, the obvious way to fix it is to make q local variable to get_server_list and pass it explicitly to get_single_server_data using args parameter of the Thread constructor which is actually a good thing anyway. 如果是这种情况,解决它的明显方法是将q局部变量设置为get_server_list并使用Thread构造函数的 args参数将其显式传递给get_single_server_data ,这实际上是一件好事。 Obviously the same goes about final_servers_list you use for output. 关于用于输出的final_servers_list显然也是如此。

Update 更新

After some more thinking, what cleans the final_servers_list up? 经过一番思考后,是什么清理了final_servers_list Assume you just run this HTTP request 10 times (sequentially, not concurrently). 假设您只运行此HTTP请求10次(按顺序,不同时)。 Why you don't expect to get the whole list of servers reported 10 times in the last response? 为什么你不希望在最后一个响应中报告10次服务器列表?

But proper solution is still the same: don't use global variable. 但正确的解决方案仍然是相同的:不要使用全局变量。 It is much more reliable and future-proof then just clearing the list. 只需清除列表,它就更可靠,面向未来。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM