简体   繁体   English

Python - 从 url 获取图像名称和扩展名不以文件文件扩展名结尾的内容

[英]Python - getting image name and extension from url what does not end with file filename extension

Basically, my goal is to fetch the filename, extension and the content of an image by its url.基本上,我的目标是通过 url 获取图像的文件名、扩展名和内容。 And my fuction should work for both of these urls:我的功能应该适用于这两个网址:

easy case: https://image.shutterstock.com/image-photo/bright-spring-view-cameo-island-260nw-1048185397.jpg简单案例: https://image.shutterstock.com/image-photo/bright-spring-view-cameo-island-260nw-1048185397.jpg

hard case (does not end with filename.extension ): https://images.unsplash.com/photo-1472214103451-9374bd1c798e?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&w=1000&q=80硬壳(不以 filename.extension 结尾): https://images.unsplash.com/photo-1472214103451-9374bd1c798e?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd09&w=1000&q=

Currently, what I have looks like this:目前,我所拥有的看起来像这样:

from os.path import splitext, basename

def get_filename_from_url(url):
       result = urllib.request.urlretrieve(url)
       filename, file_ext = splitext(basename(result.path))
       print(filename, file_ext)

This works fine for the easy case.这适用于简单的情况。 But apparently, no solution in case of hard-case url.但显然,在硬壳 url 的情况下没有解决方案。 But I have a feeling that that I can use python's requests module and parse the header to find the mimetype and then use the same module's guesstype functionality to extract the necessary data.但我有一种感觉,我可以使用 python 的requests模块并解析 header 以找到 mimetype,然后使用相同模块的 guesstype 功能来提取必要的数据。 So I went on to try this:所以我继续尝试这个:

import requests

response = requests.get(url, stream=True)

Here , someone seems to describe the clue, saying that 在这里,似乎有人描述了线索,说在此处输入图像描述

but the problem is that using the hard-case url I get something strange in the response dict items, and maybe my key issue is that I don't know the correct way to parse the header of the response to extract what I need.但问题是使用硬案例 url 我在response dict 项目中得到了一些奇怪的东西,也许我的关键问题是我不知道解析响应的 header 以提取我需要的内容的正确方法

I've tried a third approach using urlparse:我尝试了使用 urlparse 的第三种方法:

from urllib.parse import urlparse
result = urlparse(self.url)
print(os.path.basename(a.path)) # 'photo-1472214103451-9374bd1c798e'

which yields the filename, but again, I miss the extension here...这会产生文件名,但我再次错过了这里的扩展名......

The ideal solution would be to get the filename, file extension and file content in one go, preferrably being able to validate that the url actually contains an image, not something else...理想的解决方案是在一个 go 中获取文件名、文件扩展名和文件内容,最好能够验证 url 实际上包含图像,而不是其他东西......

UPD :更新

The result 1 elemet in result = urllib.request.urlretrieve(self.url) seems to contain the Content-Type , by I can't figure out how to extract it correctly. result = urllib.request.urlretrieve(self.url)中的 result 1 elemet 似乎包含Content-Type ,因为我不知道如何正确提取它。

One way is to query the content type:一种方法是查询内容类型:

>>> from urllib.request import urlopen
>>> response = urlopen(url)
>>> response.info().get_content_type()
'image/jpeg'

or using urlretrieve as in your edit:或在您的编辑中使用urlretrieve

>>> response = urllib.request.urlretrieve(url)
>>> response[1].get_content_type()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM