如何使用panda加载具有文件夹名称和图像名称但不包含ID的数据集文件？

Question

The file I am using is a text file and is in this format (below). 我使用的文件是一个文本文件，格式如下（如下所示）。 The first column represents the folder name. 第一列代表文件夹名称。 Here is a sample. 这是一个样本。

0010\\0010_01_05_03_115.jpg 0010 \\ 0010_01_05_03_115.jpg
0010\\0010_01_05_03_121.jpg 0010 \\ 0010_01_05_03_121.jpg
0010\\0010_01_05_03_125.jpg 0010 \\ 0010_01_05_03_125.jpg

How can I load it in into my program because I get this error: 由于出现此错误，如何将其加载到程序中：

img=image.load_img('TrainImages/' +TrainImages['id'][i].astype('str')+'.png', target_size=(2, 8, 28, 1),grayscale=False) File "C:\\Anaconda\\lib\\site-packages\\pandas\\core\\frame.py", line 2927, in getitem indexer = self.columns.get_loc(key) File "C:\\Anaconda\\lib\\site-packages\\pandas\\core\\indexes\\base.py", line 2659, in get_loc return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'id' img = image.load_img（'TrainImages /'+ TrainImages ['id'] [i] .astype（'str'）+'。png'，target_size =（2，8，28，1），grayscale = False）文件getitem索引器中的“ C：\\ Anaconda \\ lib \\ site-packages \\ pandas \\ core \\ frame.py”行2927 = = self.columns.get_loc（key）文件“ C：\\ Anaconda \\ lib \\ site-packages \\ pandas get_loc中的\\ core \\ indexes \\ base.py“行2659，返回self._engine.get_loc（self._maybe_cast_indexer（key））文件” pandas / _libs / index.pyx“，行108，位于pandas._libs.index中。 pandas._libs.index.IndexEngine.get_loc文件“ pandas / _libs / hashtable_class_helper.pxi”中的IndexEngine.get_loc文件“ pandas / _libs / index.pyx”，第132行，pandas._libs.hashtable.PyObjectHashTable中的行“ 1601”。 get_item文件“ pandas / _libs / hashtable_class_helper.pxi”，行1608，在pandas._libs.hashtable.PyObjectHashTable.get_item中KeyError：'id'

I am actually trying to create a training data set by reading in a file and applying some preprocessing to it before doing the rest. 我实际上正在尝试通过读取文件并在进行其余操作之前对其进行一些预处理来创建训练数据集。

This is the code I tried and I am not sure if it is correct : 这是我尝试的代码，我不确定是否正确：

TrainImages=pd.read_csv('client_train_raw.txt')
train_image =[]
for i in tqdm(range(TrainImages.shape[0])):
    img=image.load_img('TrainImages/' +TrainImages['id'] 
      [i].astype('str')+'.png', target_size=(2, 8, 28, 1),grayscale=False)
    img = image.img_to_array(img)

Answer 1

You haven't told your dataframe what 'id' means. 您尚未告诉数据框'id'是什么意思。 It looks like your data file only has one column, the file path separated by '\\' . 看来您的数据文件只有一列，文件路径以'\\'分隔。 You should be able to fix this with: 您应该可以通过以下方法解决此问题：

train_images = pd.read_csv('client_train_raw.txt', header=False, names=['id'])

This will label the single column in your dataframe as 'id' and you'll stop getting that error. 这会将数据框中的单列标记为'id' ，您将停止获取该错误。 I think there are still going to be some issues with how you are handling file paths, and I'm not sure that the [i] in TrainImages['id'][i].astype('str') is doing what you think it is. 我认为您如何处理文件路径仍然存在一些问题，并且我不确定TrainImages['id'][i].astype('str')中的[i]在做什么认为是。

Also you probably don't need to use Pandas for this read. 同样，您可能不需要使用Pandas进行此阅读。 Since each line in your file is a path to an image, you could just use: 由于文件中的每一行都是图像的路径，因此您可以使用：

with open('client_train_raw.txt', 'r') as a_file:
    for idx, line in enumerate(a_file):
        # Each line will be a path to a data file.
        img = image.load_img('TrainImages/' + line + idx + '.png', ...)
        img = image.img_to_array(img)

or something, but I'm not sure what the idx here should be doing. 之类的，但是我不确定这里的idx应该做什么。

如何使用panda加载具有文件夹名称和图像名称但不包含ID的数据集文件？

问题描述

1 个解决方案

解决方案1
0 已采纳 2019-05-10 14:58:03

如何使用panda加载具有文件夹名称和图像名称但不包含ID的数据集文件？

问题描述

1 个解决方案

解决方案1 0 已采纳 2019-05-10 14:58:03

解决方案1
0 已采纳 2019-05-10 14:58:03