For-loop in a list with datasets in Python

Question

I have 32 datasets, with the same structure, I need to do some preparation in each one and then join them. For do the cleaning I've prepared a function and then I've tried to put that function in a loop, but it doesn't work.

Here my code

First: I imported the datasets to my environment in a list called files , I'm working in Google Colab.

import glob
import os 
os.chdir('/content')

extension = 'xls'
all_files = [i for i in glob.glob('*.{}'.format(extension))]

files = []
for filename in all_files:
  data = pd.read_excel(filename, skiprows=6)
  files.append(data)

Second: I did my cleaning function.

def data_cleaning(data):
  data = data.iloc[2:, :4]
  data = data[(data['Desglose'] != 'Localidades')]
  data = data.drop(columns='Desglose')
  data = data.rename(columns={'Total de localidades y su población1': 'poblacion'})
  data['municipio'] = data['Municipio'].str.split(' ', n = 1).str.get(-1)
  data['entidad_federativa'] = data['Entidad federativa'].str.split(' ', n = 1).str.get(-1)
  data = data.iloc[:, 2:]
  return data

And finally: I'll try to make a for loop to repeat the cleaning process in each dataset of the list files .

files_clean = []
for i in files:
  data_clean = data_cleaning(files[i])
  files_clean.append(data_clean)

The error I get is:

TypeError                                 Traceback (most recent call last)
<ipython-input-44-435517607919> in <module>()
      1 files_clean = []
      2 for i in files:
----> 3   data_clean = data_cleaning(files[i])
      4   files_clean.append(data_clean)

TypeError: list indices must be integers or slices, not DataFram

I've done a similar process in R but I can't repeat it in Python . So, any suggestions would be appreciated.

Thank you very much for your time.

Answer 1

The error TypeError: list indices must be integers or slices, not DataFram is raised when you try to access a list using DataFrame values instead of an integer. To solve this problem, make sure that you access a list using an index number.

A common scenario where this error is raised is when you iterate over a list and compare objects in the list. To solve this error, you can use range() in python for loops.

for i in range(len(files))

or else you can check the type of files and type of one object in files and make necessary changes according to that.

Answer 2

The problem is with the index. for i in files does not return i as an integer but as a dataframe. A possible solution to your problem will be:

for df in files:
  data_clean = data_cleaning(df)
  files_clean.append(data_clean)

or similarly

for i in range(len(files)):
  data_clean = data_cleaning(files[i])
  files_clean.append(data_clean)

or possibly

for i, df in enumerate(files):
  data_clean = data_cleaning(files[i])
  files_clean.append(data_clean)

For-loop in a list with datasets in Python

Question

2 answers

solution1
0 2021-07-25 18:04:09

solution2
0 2021-07-25 18:05:22

For-loop in a list with datasets in Python

Question

2 answers

solution1 0 2021-07-25 18:04:09

solution2 0 2021-07-25 18:05:22

solution1
0 2021-07-25 18:04:09

solution2
0 2021-07-25 18:05:22