简体   繁体   中英

Set content type authomatically when uploading to Azure Blob Storage

I am trying to upload thousands of files to Azure storage with Python. Their content type is by default set to 'application/octet-stream'

I know I can set the content type of an individual file with (specified here ):

content_settings = ContentSettings(content_type="something")

But the issue is that the content type is not the same for all the files. I have images, javascript files, html files, and more than hundred other types.

How do I automatically set the content type based on the file that is sitting locally in my machine before I upload it to Azure storage blob using Python?

I know it is doable, because azcopy uploads with correct content type. (I tested)

Here is the part of the Python code I have written:

def upload(self, overwrite=False):
    with ThreadPool(self._num_threads) as pool:
        request_processes = []
        for local_path, remote_path in path_generator()
            blob_client = BlobClient.from_blob_url(blob_url=remote_path, credential=credentials)
            file_extension: str = local_path.split('/')[-1].split('.')[-1]
            content_settings = ContentSettings(content_type=file_extension)
            kwargs: Dict[str, Any] = {'blob_type': 'BlockBlob',
                                      'content_settings': content_settings,
                                      'validate_content': True,
                                      'overwrite': overwrite,
                                      'timeout': self._timeout_seconds,
                                      'max_concurrency': 10}
            request_processes.append(pool.apply_async(self._sync_upload_file,
                                                          [blob_client, local_path],
                                                          kwds=kwargs))
            for req in request_processes:
                req.get(self._timeout_seconds)

    @staticmethod
    def _sync_upload_file(blob_client, local_path: str, *args, **kwargs):
        with open(local_path, "rb") as data:
            resp = blob_client.upload_blob(data, *args, **kwargs)
        return resp

The issue with this code is that it uses the file extension (js, html, png, etc) as content type. For example the content type of an html file should be text/html

I found one solution in this website .

They created a map (dictionary) from more than 650 file extensions to contentypes. But I still have some file types that are not present in this dictionary. Such as ".map", ".config", ".info", and a few more.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM