How do I load a dataset file that has folder name and image name but does not contain an id in python using panda?

Question

The file I am using is a text file and is in this format (below). The first column represents the folder name. Here is a sample.

0010\\0010_01_05_03_115.jpg
0010\\0010_01_05_03_121.jpg
0010\\0010_01_05_03_125.jpg

How can I load it in into my program because I get this error:

img=image.load_img('TrainImages/' +TrainImages['id'][i].astype('str')+'.png', target_size=(2, 8, 28, 1),grayscale=False) File "C:\\Anaconda\\lib\\site-packages\\pandas\\core\\frame.py", line 2927, in getitem indexer = self.columns.get_loc(key) File "C:\\Anaconda\\lib\\site-packages\\pandas\\core\\indexes\\base.py", line 2659, in get_loc return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'id'

I am actually trying to create a training data set by reading in a file and applying some preprocessing to it before doing the rest.

This is the code I tried and I am not sure if it is correct :

TrainImages=pd.read_csv('client_train_raw.txt')
train_image =[]
for i in tqdm(range(TrainImages.shape[0])):
    img=image.load_img('TrainImages/' +TrainImages['id'] 
      [i].astype('str')+'.png', target_size=(2, 8, 28, 1),grayscale=False)
    img = image.img_to_array(img)

Answer 1

You haven't told your dataframe what 'id' means. It looks like your data file only has one column, the file path separated by '\\' . You should be able to fix this with:

train_images = pd.read_csv('client_train_raw.txt', header=False, names=['id'])

This will label the single column in your dataframe as 'id' and you'll stop getting that error. I think there are still going to be some issues with how you are handling file paths, and I'm not sure that the [i] in TrainImages['id'][i].astype('str') is doing what you think it is.

Also you probably don't need to use Pandas for this read. Since each line in your file is a path to an image, you could just use:

with open('client_train_raw.txt', 'r') as a_file:
    for idx, line in enumerate(a_file):
        # Each line will be a path to a data file.
        img = image.load_img('TrainImages/' + line + idx + '.png', ...)
        img = image.img_to_array(img)

or something, but I'm not sure what the idx here should be doing.

How do I load a dataset file that has folder name and image name but does not contain an id in python using panda?

Question

1 answers

solution1
0 ACCPTED 2019-05-10 14:58:03

How do I load a dataset file that has folder name and image name but does not contain an id in python using panda?

Question

1 answers

solution1 0 ACCPTED 2019-05-10 14:58:03

solution1
0 ACCPTED 2019-05-10 14:58:03