简体   繁体   English

多线程python上的“文本文件繁忙”错误

[英]“Text file busy” error on multithreading python

I have a python script which downloads shell scripts from amazon S3 server and then executes them (each script is about 3GB in size). 我有一个Python脚本,该脚本从Amazon S3服务器下载shell脚本,然后执行它们(每个脚本的大小约为3GB)。 The function that downloads and executes the file looks like this: 下载并执行文件的功能如下所示:

import boto3

def parse_object_key(key):
    key_parts = key.split(':::')
    return key_parts[1]

def process_file(file):
    client = boto3.client('s3')
    node = parse_object_key(file)
    file_path = "/tmp/" + node + "/tmp.sh"
    os.makedirs(file_path)
    client.download_file('category', file, file_path)
    os.chmod(file_path, stat.S_IXUSR)
    os.system(file_path)

The node is unique for each file. 该节点对于每个文件都是唯一的。

I created a for loop to execute this: 我创建了一个for循环来执行此操作:

s3 = boto3.resource('s3')
bucket = s3.Bucket('category')
for object in bucket.objects.page_size(count=50):
    process_file(object.key, client)

This works perfectly, but when I try to create a separate thread for each file, I get error: 这完美地工作,但是当我尝试为每个文件创建单独的线程时,出现错误:

sh: 1: /path/to/file: Text file busy

The script with threading looks like: 具有线程的脚本如下所示:

s3 = boto3.resource('s3')
bucket = s3.Bucket('category')
threads = []
for object in bucket.objects.page_size(count=50):
    t = threading.Thread(target=process_file, args=(object.key, client))
    threads.append(t)
    t.start()

for t in threads:
    t.join()

Out of all the threads, exactly one thread succeed and all other fail on "Text file busy error". 在所有线程中,仅一个线程成功,而其他所有线程因“文本文件繁忙错误”而失败。 Can someone help me figure out what I am doing incorrectly? 有人可以帮我弄清楚我做错了什么吗?

Boto3 is not thread-safe so you cannot re-use your S3 connection for each download. Boto3不是线程安全的,因此您不能为每次下载重复使用S3连接。 See here for details of a workaround. 有关解决方法的详细信息,请参见此处

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM