[英]how to get jpg images width and height of all jpg files in a folder
I have a folder poster_folder
containing jpg files say for example 1.jpg,2.jpg, 3.jpg我有一个包含 jpg 文件的文件夹poster_folder
,例如 1.jpg、2.jpg、3.jpg
path to this folder is:该文件夹的路径是:
from pathlib import Path
from PIL import Image
images_dir = Path('C:\\Users\\HP\\Desktop\\PGDinML_AI_IIITB\\MS_LJMU\\Dissertation topics\\Project_2_Classification of Genre for Movies using Machine Leaning and Deep Learning\\Final_movieScraping_data_textclasification\\posters_final').expanduser()
I have a data frame with jpg image info as:我有一个带有 jpg 图像信息的数据框:
df_subset_cleaned_poster.head(3)
movie_name movie_image
Lion_king 1.jpg
avengers 2.jpg
iron_man 3.jpg
I am trying to plot a scatter of width and height of all jpg files (as they are of different resolution) in the folder as below:我正在尝试 plot 散布文件夹中所有 jpg 文件的宽度和高度(因为它们具有不同的分辨率),如下所示:
height, width = np.empty(len(df_subset_cleaned_poster)), np.empty(len(df_subset_cleaned_poster))
for i in range(len(df_subset_cleaned_poster.movie_image)):
w, h = Image.open(images_dir.joinpath(df_subset_cleaned_poster['movie_image'][i])).size
width[i], height[i] = w, h
plt.scatter(width, height, alpha=0.5)
plt.xlabel('Width'); plt.ylabel('Height'); plt.show()
This is throwing error: KeyError: 208
这是抛出错误: KeyError: 208
df_subset_cleaned_poster.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 10225 entries, 0 to 10986
Data columns (total 2 columns):
movie_name 10225 non-null object
movie_image 10225 non-null object
dtypes: object(2)
As discussed in the comments: The issue seems to be in the creating of the dataframe or in the the csv file itself.正如评论中所讨论的:问题似乎在于创建 dataframe 或 csv 文件本身。
I was able to create a proper scatter plot with the following code:我能够使用以下代码创建正确的散点图 plot:
from pathlib import Path
import numpy as np
import pandas as pd
from PIL import Image
import matplotlib.pyplot as plt
from io import StringIO
if __name__ == '__main__':
images_dir = Path("../data/images")
infile = StringIO("""movie_name,movie_image
Lion_king,1.jpg
avengers,2.jpg
iron_man,3.jpg
""")
df_subset_cleaned_poster = pd.read_csv(infile)
n = len(df_subset_cleaned_poster)
height, width = np.empty(n), np.empty(n)
for i, filename in enumerate(df_subset_cleaned_poster.movie_image):
w, h = Image.open(images_dir / filename).size
width[i], height[i] = w, h
plt.scatter(width, height, alpha=0.5)
plt.xlabel('Width')
plt.ylabel('Height')
plt.show()
I suggest that you use this code as the starting point for further experiments.我建议您使用此代码作为进一步实验的起点。 I am using enumerate
to iterate over all rows in df_subset_cleaned_poster.movie_image
.我正在使用enumerate
遍历df_subset_cleaned_poster.movie_image
中的所有行。 This should be more robust against IndexErrors on its own.这应该对 IndexErrors 本身更健壮。
As you can see, I replaced the infile
with a mock string to StringIO
.如您所见,我将infile
替换为StringIO
的模拟字符串。 Just replace it with infile = open("your_file.txt")
to use the real data again.只需将其替换为infile = open("your_file.txt")
即可再次使用真实数据。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.