I have 32 datasets, with the same structure, I need to do some preparation in each one and then join them. For do the cleaning I've prepared a function and then I've tried to put that function in a loop, but it doesn't work.
Here my code
First: I imported the datasets to my environment in a list called files
, I'm working in Google Colab.
import glob
import os
os.chdir('/content')
extension = 'xls'
all_files = [i for i in glob.glob('*.{}'.format(extension))]
files = []
for filename in all_files:
data = pd.read_excel(filename, skiprows=6)
files.append(data)
Second: I did my cleaning function.
def data_cleaning(data):
data = data.iloc[2:, :4]
data = data[(data['Desglose'] != 'Localidades')]
data = data.drop(columns='Desglose')
data = data.rename(columns={'Total de localidades y su población1': 'poblacion'})
data['municipio'] = data['Municipio'].str.split(' ', n = 1).str.get(-1)
data['entidad_federativa'] = data['Entidad federativa'].str.split(' ', n = 1).str.get(-1)
data = data.iloc[:, 2:]
return data
And finally: I'll try to make a for
loop to repeat the cleaning process in each dataset of the list files
.
files_clean = []
for i in files:
data_clean = data_cleaning(files[i])
files_clean.append(data_clean)
The error I get is:
TypeError Traceback (most recent call last)
<ipython-input-44-435517607919> in <module>()
1 files_clean = []
2 for i in files:
----> 3 data_clean = data_cleaning(files[i])
4 files_clean.append(data_clean)
TypeError: list indices must be integers or slices, not DataFram
I've done a similar process in R
but I can't repeat it in Python
. So, any suggestions would be appreciated.
Thank you very much for your time.
The error TypeError: list indices must be integers or slices, not DataFram
is raised when you try to access a list using DataFrame values instead of an integer. To solve this problem, make sure that you access a list using an index number.
A common scenario where this error is raised is when you iterate over a list and compare objects in the list. To solve this error, you can use range()
in python for loops.
for i in range(len(files))
or else you can check the type
of files
and type of one object in files and make necessary changes according to that.
The problem is with the index. for i in files
does not return i as an integer but as a dataframe. A possible solution to your problem will be:
for df in files:
data_clean = data_cleaning(df)
files_clean.append(data_clean)
or similarly
for i in range(len(files)):
data_clean = data_cleaning(files[i])
files_clean.append(data_clean)
or possibly
for i, df in enumerate(files):
data_clean = data_cleaning(files[i])
files_clean.append(data_clean)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.