如何获取文件夹中所有jpg文件的jpg图像宽度和高度

Question

我有一个包含 jpg 文件的文件夹poster_folder ，例如 1.jpg、2.jpg、3.jpg

该文件夹的路径是：

from pathlib import Path
from PIL import Image

images_dir = Path('C:\\Users\\HP\\Desktop\\PGDinML_AI_IIITB\\MS_LJMU\\Dissertation topics\\Project_2_Classification of Genre for Movies using Machine Leaning and Deep Learning\\Final_movieScraping_data_textclasification\\posters_final').expanduser()

我有一个带有 jpg 图像信息的数据框：

df_subset_cleaned_poster.head(3)

movie_name  movie_image

Lion_king   1.jpg
avengers    2.jpg
iron_man    3.jpg

我正在尝试 plot 散布文件夹中所有 jpg 文件的宽度和高度（因为它们具有不同的分辨率），如下所示：

height, width = np.empty(len(df_subset_cleaned_poster)), np.empty(len(df_subset_cleaned_poster))

for i in range(len(df_subset_cleaned_poster.movie_image)):
    w, h = Image.open(images_dir.joinpath(df_subset_cleaned_poster['movie_image'][i])).size
    width[i], height[i] = w, h
plt.scatter(width, height, alpha=0.5)
plt.xlabel('Width'); plt.ylabel('Height'); plt.show()

这是抛出错误： KeyError: 208

df_subset_cleaned_poster.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10225 entries, 0 to 10986
Data columns (total 2 columns):
movie_name                  10225 non-null object
movie_image                 10225 non-null object
dtypes: object(2)

Answer 1

正如评论中所讨论的：问题似乎在于创建 dataframe 或 csv 文件本身。

我能够使用以下代码创建正确的散点图 plot：


from pathlib import Path

import numpy as np
import pandas as pd
from PIL import Image
import matplotlib.pyplot as plt
from io import StringIO

if __name__ == '__main__':
    images_dir = Path("../data/images")

    infile = StringIO("""movie_name,movie_image
Lion_king,1.jpg
avengers,2.jpg
iron_man,3.jpg
""")

    df_subset_cleaned_poster = pd.read_csv(infile)

    n = len(df_subset_cleaned_poster)
    height, width = np.empty(n), np.empty(n)

    for i, filename in enumerate(df_subset_cleaned_poster.movie_image):
        w, h = Image.open(images_dir / filename).size
        width[i], height[i] = w, h

    plt.scatter(width, height, alpha=0.5)
    plt.xlabel('Width')
    plt.ylabel('Height')
    plt.show()

我建议您使用此代码作为进一步实验的起点。 我正在使用enumerate遍历df_subset_cleaned_poster.movie_image中的所有行。 这应该对 IndexErrors 本身更健壮。

如您所见，我将infile替换为StringIO的模拟字符串。 只需将其替换为infile = open("your_file.txt")即可再次使用真实数据。

如何获取文件夹中所有jpg文件的jpg图像宽度和高度

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-04-13 16:25:52

如何获取文件夹中所有jpg文件的jpg图像宽度和高度

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-04-13 16:25:52

解决方案1
1 已采纳 2020-04-13 16:25:52