Efficiently collect logs from Heroku and Archive to S3

Question

I have a bunch of Heroku apps (similar apps deployed on multiple instances to server different customers). Each app generates some logs and I wish to archive the generated logs to S3.

I tried using Heroku addons but after a point, the price of these addon services do not justify my use case.

Therefore, I tried writing a very simple log drain and started converting the log entries into file and publishing the files to S3.

Here's the sample code -

    now = datetime.datetime.now()
    r = math.floor(random.random() * 10000)

    bucket = 'my-example-bucket'
    key = '/logdrain/raw/{y}/{mon}/{d}/{h}/{h}-{min}-{s}-{ms}-{r}.txt'.format(
        y=now.year, mon=now.month, d=now.day, h=now.hour, min=now.minute, s=now.second, ms=now.microsecond//1000, r=r)

    s3_file_path = 's3://{bucket}/{key}'.format(bucket=bucket, key=key)

    # Append the new content and save the file back to S3
    data = str(request.data)

    if 'l10' in data.lower():
        print('ERROR: ', data)

    with smart_open.open(s3_file_path, 'w') as fout:
        fout.write(data)

    return 'Log write successful', 200

But looks like I am unable to match the ingestion rate of logplex with my consumption/processing speed. Here's the logline

ERROR:  b'142 <172>1 2019-07-03T10:06:07+00:00 host heroku logplex - Error L10 (output buffer overflow): 6 messages dropped since 2019-07-03T09:43:52+00:00.595 <158>1 2019-07-03T10:06:07.509894+00:00 host heroku router - at=info ...

Here's the doc which confirms the same. Just wanted to know if someone has a better approach in mind to implement the log drain.

PS: I have deployed this on a flask app on Heroku with 2enterprise dynos and still getting the message dropped issue.

Answer 1

I think that writing directly to S3 for every incoming drain request is never going to be fast enough to avoid an L10 error. I think you need two things:

a buffer
non-blocking upload to S3

I'd suggest writing to a local file until a) it reaches a certain size, and/or b) a certain amount of time has elapsed. When the file is thus "finished", upload it to S3 without blocking the main (listening) loop. There are many ways to implement non-blocking upload, but the simplest is probably threading.

Because Heroku has an ephemeral filesystem and cycles its dynos, you'll need to trap SIGTERM and quickly upload whatever you have buffered at that time before the SIGKILL .

I'm curious how this works for you as I've been thinking of implementing something similar for my own Heroku apps.

Efficiently collect logs from Heroku and Archive to S3

Question

1 answers

solution1
0 2019-08-07 17:13:36

Efficiently collect logs from Heroku and Archive to S3

Question

1 answers

solution1 0 2019-08-07 17:13:36

solution1
0 2019-08-07 17:13:36