简体   繁体   中英

Django: query in multiprocessing occurs django.db.utils.OperationalError: SSL error?

I used Django 1.11.1 and PostgreSQL as a database.

Here is the code:

models.py

class Symbol(StockDataBaseModel):
    code = models.CharField(max_length=20)
    name = models.CharField(max_length=30)


class DailyPrice(StockDataBaseModel):
    symbol = models.ForeignKey(Symbol)

    date_time = models.DateTimeField()
    open = models.DecimalField(max_digits=15, decimal_places=2)
    high = models.DecimalField(max_digits=15, decimal_places=2)
    low = models.DecimalField(max_digits=15, decimal_places=2)
    close = models.DecimalField(max_digits=15, decimal_places=2)

Getting data part

def _get_price_df(ticker, start_date, end_date):
    from data_manager.models import DailyPrice, Symbol

    symbol_set = Symbol.objects.prefetch_related('dailyprice_set').filter(name__iexact=ticker)
    if symbol_set.exists():
        symbol = symbol_set[0]
        if start_date and end_date:
            price_set = symbol.dailyprice_set.filter(Q(date_time__gte=start_date) & Q(date_time__lte=end_date))
        elif start_date:
            price_set = symbol.dailyprice_set.filter(Q(date_time__gte=start_date))
        elif end_date:
            price_set = symbol.dailyprice_set.filter(Q(date_time__lte=end_date))
        else:
            price_set = symbol.dailyprice_set.all()   
    else:
        raise ValueError("No such ticker exists : {}".format(ticker))


class StockDailyDataManager(object):

    def get_price_data_df(self, tickers, start_date=None, end_date=None):
        pool = Pool(12)
        df_list = pool.starmap(
            _get_price_df, [(ticker, start_date, end_date) for ticker in tickers]
        )

What I tried:

tickers = ['MSFT', 'AAPL', 'samsung'...]
pool = Pool(12)
df_list = pool.starmap(
    _get_price_df,
    [(ticker, start_date, end_date) for ticker in tickers]
)

It occurred Error:

django.db.utils.OperationalError: SSL error: decryption failed or bad record mac

How can I solve this?

I used this in a management command inside of which I used a multiprocessing pool. Calling

from django import db
db.connections.close_all()

at the very start of each process helped. Then when the database was accessed from inside the process, the process created a new connection and didn't try to share the original connection from the parent.

You have to close the database connection first, it will be re-opened in each process:

from django import db
db.connections.close_all()

Please note that this didn't work for me in a migration. I guess it had to do with transactions. So, I had to move the work to a management command:

from multiprocessing import cpu_count, Pool, current_process
import tqdm

from django.core.management.base import BaseCommand
from django import db


def process(id):
  obj = <yourModel>.objects.get(pk=id)
  ... do something ....

class Command(BaseCommand):
    help = "do something in parallel"


    def handle(self, *args, **options):

        objects = <YourModel.objects.all().values_list('id', flat=True) 
        requests = list(objects)
        total = len(objects)

        # close db connections, they will be recreated automatically
        db.connections.close_all()

        pool = Pool(processes=cpu_count())
        for _ in tqdm.tqdm(pool.imap_unordered(process, objects), total=total):
            pass

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM