如何使用 Elasticsearch-dsl 在 Django 中进行并行测试？

Question

有没有人在 Django 和 Elasticsearch 中进行并行测试？ 如果是这样，您能否分享实现它所需的配置更改？

我已经尝试了几乎所有我能想到的方法来使其工作，包括此处概述的解决方案。 从 Django 本身如何处理并行数据库中获得灵感，我目前创建了一个自定义的新ParallelTestSuite ，它覆盖了 init_worker 以遍历每个索引/文档类型并大致更改索引名称，如下所示：

_worker_id = 0
def _elastic_search_init_worker(counter):
    global _worker_id

    with counter.get_lock():
        counter.value += 1
        _worker_id = counter.value

    for alias in connections:
        connection = connections[alias]
        settings_dict = connection.creation.get_test_db_clone_settings(_worker_id)
        # connection.settings_dict must be updated in place for changes to be
        # reflected in django.db.connections. If the following line assigned
        # connection.settings_dict = settings_dict, new threads would connect
        # to the default database instead of the appropriate clone.
        connection.settings_dict.update(settings_dict)
        connection.close()

    ### Everything above this is from the Django version of this function ###

    # Update index names in doctypes
    for doc in registry.get_documents():
        doc._doc_type.index += f"_{_worker_id}"

    # Update index names for indexes and create new indexes
    for index in registry.get_indices():
        index._name += f"_{_worker_id}"
        index.delete(ignore=[404])
        index.create()

    print(f"Started thread # {_worker_id}")

这似乎通常有效，但是，有些奇怪的事情似乎是随机发生的（即再次运行测试套件不能可靠地重现问题和/或错误消息发生变化）。 以下是我遇到的各种错误，似乎每次测试运行时其中一个错误随机失败：

尝试在上面的 function 中创建索引时引发 404（我已经确认它是从 PUT 请求返回的 404，但是在 Elasticsearch 服务器日志中它说它创建了没有问题的索引）
尝试创建索引时出现 500，虽然这个已经有一段时间没有发生了，所以我认为这是由其他东西修复的
查询响应有时在 elasticsearch 库的_process_bulk_chunk function 中没有items字典值

我在想在连接层发生了一些奇怪的事情（比如 Django 测试运行进程之间的连接以某种方式混淆了响应？）但是我不知道自从 Django 使用以来这怎么可能multiprocessing 来并行化测试，因此它们每个都在自己的进程中运行。 是否有可能衍生进程仍在尝试使用原始进程的连接池或其他东西？ 我真的不知道从这里可以尝试的其他事情，并且非常感谢一些提示，甚至只是确认这实际上是可能的。

Answer 1

我在想在连接层发生了一些奇怪的事情（比如 Django 测试运行进程之间的连接以某种方式混淆了响应？）但是我不知道自从 Django 使用以来这怎么可能multiprocessing 来并行化测试，因此它们每个都在自己的进程中运行。 是否有可能衍生进程仍在尝试使用原始进程的连接池或其他东西？

这正是正在发生的事情。 来自Elasticsearch DSL 文档：

由于我们在整个客户端使用持久连接，这意味着客户端不能很好地容忍分叉。 如果您的应用程序调用多个进程，请确保在调用 fork 后创建一个新的客户端。 请注意，Python 的多处理模块使用 fork 在 POSIX 系统上创建新进程。

我观察到的情况是，响应非常奇怪地与可能已启动请求的看似随机的客户端交织在一起。 因此，索引文档的请求可能会以创建索引的响应结束，该索引具有非常不同的属性。

修复是为了确保每个测试工作人员都有自己的Elasticsearch客户端。 这可以通过创建特定于 worker 的连接别名，然后用特定于 worker 的别名覆盖当前连接别名（使用私有属性_using ）来完成。 以下是您随更改发布的代码的修改版本

_worker_id = 0
def _elastic_search_init_worker(counter):
    global _worker_id

    with counter.get_lock():
        counter.value += 1
        _worker_id = counter.value

    for alias in connections:
        connection = connections[alias]
        settings_dict = connection.creation.get_test_db_clone_settings(_worker_id)
        # connection.settings_dict must be updated in place for changes to be
        # reflected in django.db.connections. If the following line assigned
        # connection.settings_dict = settings_dict, new threads would connect
        # to the default database instead of the appropriate clone.
        connection.settings_dict.update(settings_dict)
        connection.close()

    ### Everything above this is from the Django version of this function ###

    from elasticsearch_dsl.connections import connections

    # each worker needs its own connection to elasticsearch, the ElasticsearchClient uses
    # global connection objects that do not play nice otherwise
    worker_connection_postfix = f"_worker_{_worker_id}"
    for alias in connections:
        connections.configure(**{alias + worker_connection_postfix: settings.ELASTICSEARCH_DSL["default"]})

    # Update index names in doctypes
    for doc in registry.get_documents():
        doc._doc_type.index += f"_{_worker_id}"
        # Use the worker-specific connection
        doc._doc_type._using = doc.doc_type._using + worker_connection_postfix

    # Update index names for indexes and create new indexes
    for index in registry.get_indices():
        index._name += f"_{_worker_id}"
        index._using = doc.doc_type._using + worker_connection_postfix
        index.delete(ignore=[404])
        index.create()

    print(f"Started thread # {_worker_id}")

如何使用 Elasticsearch-dsl 在 Django 中进行并行测试？

问题描述

1 个解决方案

解决方案1
0 已采纳 2022-05-23 03:58:19

如何使用 Elasticsearch-dsl 在 Django 中进行并行测试？

问题描述

1 个解决方案

解决方案1 0 已采纳 2022-05-23 03:58:19

解决方案1
0 已采纳 2022-05-23 03:58:19