简体   繁体   中英

AWS Lambda "errorMessage": "[Errno 26] Text file busy"

I wrote my code and uploaded it to AWS Lambda succesfully via AWS SAM CLI. It basically goes into the URL I gave, and prints the title of the website. A very beginner level code. Below is my code:

import os, shutil, uuid, time
from selenium import webdriver

def setup():
    BIN_DIR = "/tmp/bin"
    if not os.path.exists(BIN_DIR):
        print("Creating bin folder")
        os.makedirs(BIN_DIR)

    LIB_DIR = '/tmp/bin/lib'
    if not os.path.exists(LIB_DIR):
        print("Creating lib folder")
        os.makedirs(LIB_DIR)
        
    for filename in ['chromedriver', 'headless-chromium', 'lib/libgconf-2.so.4', 'lib/libORBit-2.so.0']:
        oldfile = f'/opt/{filename}'
        newfile = f'{BIN_DIR}/{filename}'
        shutil.copy2(oldfile, newfile)
        os.chmod(newfile, 0o775)

def init_web_driver():
    setup()
    chrome_options = webdriver.ChromeOptions()
    _tmp_folder = '/tmp/{}'.format(uuid.uuid4())

    if not os.path.exists(_tmp_folder):
        os.makedirs(_tmp_folder)

    if not os.path.exists(_tmp_folder + '/user-data'):
        os.makedirs(_tmp_folder + '/user-data')

    if not os.path.exists(_tmp_folder + '/data-path'):
        os.makedirs(_tmp_folder + '/data-path')

    if not os.path.exists(_tmp_folder + '/cache-dir'):
        os.makedirs(_tmp_folder + '/cache-dir')

    chrome_options.add_argument('--headless')
    chrome_options.add_argument('--no-sandbox')
    chrome_options.add_argument('--disable-gpu')
    chrome_options.add_argument('--window-size=1280x1696')
    chrome_options.add_argument('--user-data-dir={}'.format(_tmp_folder + '/user-data'))
    chrome_options.add_argument('--hide-scrollbars')
    chrome_options.add_argument('--enable-logging')
    chrome_options.add_argument('--log-level=0')
    chrome_options.add_argument('--v=99')
    chrome_options.add_argument('--single-process')
    chrome_options.add_argument('--data-path={}'.format(_tmp_folder + '/data-path'))
    chrome_options.add_argument('--ignore-certificate-errors')
    chrome_options.add_argument('--homedir={}'.format(_tmp_folder))
    chrome_options.add_argument('--disk-cache-dir={}'.format(_tmp_folder + '/cache-dir'))
    chrome_options.add_argument(
        'user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36')

    chrome_options.binary_location = "/tmp/bin/headless-chromium"

    driver = webdriver.Chrome(chrome_options=chrome_options)
    return driver

def lambda_handler(event, context):
    driver = init_web_driver()
    driver.get("http://www.mjlivesey.co.uk")
    time.sleep(4)
    print(driver.title)

When I click the "Test" button for the first time, I can see the "driver.title" in the output screen. But when I click again just after a couple of seconds, below error is showing up:

{
  "errorMessage": "[Errno 26] Text file busy: '/tmp/bin/chromedriver'",
  "errorType": "OSError",
  "stackTrace": [
    "  File \"/var/task/app.py\", line 61, in lambda_handler\n    driver = init_web_driver()\n",
    "  File \"/var/task/app.py\", line 22, in init_web_driver\n    setup()\n",
    "  File \"/var/task/app.py\", line 18, in setup\n    shutil.copy2(oldfile, newfile)\n",
    "  File \"/var/lang/lib/python3.7/shutil.py\", line 266, in copy2\n    copyfile(src, dst, follow_symlinks=follow_symlinks)\n",
    "  File \"/var/lang/lib/python3.7/shutil.py\", line 121, in copyfile\n    with open(dst, 'wb') as fdst:\n"
  ]
}

And if I wait half an hour or more, I can run the code succesfully again. I don't get the problem here. Maybe you guys can help me to see the point.

Thanks.

It seems that several invocations of the same function are touching the same files on which previous executions still have an active lock.

To understand this behavior it's useful to understand how the Lambda Execution environment actually works. Basically, if you execute the same function several times in a short span of time, AWS will try to reuse the same execution environment and resources; this saves time and resources.

Just to make a parallel, what is happening is the same as executing your code locally several times in parallel. Since all the processes are reading and writing on the same files/folders there will be inevitable race conditions.

In your case you you should refactor your setup function in a way that its content gets executed only once per execution environment.

Also, you should be mindful of the fact that the /tmp directory has an hard limit of 512MB after which your function will be killed. If you want to persist your data and/or have more headroom you should consider looking into attaching EFS to your lambda .

Just to add to the previous answer, I was using the same code and I found this worked to delete the tmp folder and enable subsequent lambdas to run.

def setup():
    BIN_DIR = "/tmp/bin"
    if not os.path.exists(BIN_DIR):
        print("Creating bin folder")
        os.makedirs(BIN_DIR)
    else:
        print("Delete all files in folder")
        for filename in os.listdir(BIN_DIR):
            file_path = os.path.join(BIN_DIR, filename)
            try:
                if os.path.isfile(file_path) or os.path.islink(file_path):
                    os.unlink(file_path)
                elif os.path.isdir(file_path):
                    shutil.rmtree(file_path)
            except Exception as e:
                print('Failed to delete %s. Reason: %s' % (file_path, e))
        print("Deleting bin folder")
        os.rmdir(BIN_DIR)
        print("Creating bin folder")
        os.makedirs(BIN_DIR)

Just add

driver.quit()

at the end of your code

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM